Is Clustering Really Necessary?
After managing Proxmox across multiple locations, I've found that independent nodes can work better than forced clustering. But how can we replace cluster features with better alternatives? Let's explore this further.
Proxmox Datacenter Manager: A Solution?
Proxmox Datacenter Manager is the official solution for managing independent nodes and clusters from one interface. It runs on Rust, allowing it to query thousands of endpoints simultaneously without resource strain. But what are the limitations of this approach?
I connect each standalone node to PDM using API tokens. PDM treats every node as a 'remote' and aggregates CPU, RAM, and storage metrics into a central dashboard. The global search function lets me find any VM instantly across dozens of locations by name or IP address. But how secure is this setup?
Potential Risks and Benefits
Here are some potential risks and benefits to consider:
- Potential Risks:
- Security breaches due to API token management
- Data inconsistencies across nodes
- Dependence on a single central dashboard
- Benefits:
- Improved scalability without clustering
- Simplified management using PDM
- Reduced complexity compared to traditional clustering
Proxmox Backup Server: A Reliable Solution?
My central Proxmox Backup Server acts as more than a backup destination. It becomes the data exchange hub for all independent nodes. Each Proxmox node connects to PBS and sends deduplicated backups on schedule. But what if the backup server fails?
The migration workflow through PBS is reliable. To move a VM between nodes that can't communicate directly, I back up from the source node to PBS and restore to the destination. But is this process efficient?
Disaster Recovery and Replication
For disaster recovery, PBS remote sync pulls backups from the branch office PBS instances to the main datacenter. If a remote site goes down, you still have recent backups centrally. But how often should backups be synced?
For nodes running ZFS storage, pve-zsync handles automated replication over SSH. This tool creates snapshots at defined intervals and sends incremental changes to a secondary node. But what are the limitations of this approach?
Infrastructure as Code: The Way Forward?
Without a cluster’s shared configuration database, Infrastructure as Code becomes the source of truth. Terraform provisions VMs, while Ansible configures both the Proxmox hosts and the guest systems. But is this approach scalable?
Terraform handles this through provider aliasing. Instead of relying on a single Proxmox provider, you define one provider block per standalone node, each with its own alias. But how complex is this setup?
Adding a new node only requires adding one provider block. Existing modules can immediately deploy to it. But what about user management and permissions?
Ansible takes care of the Proxmox hosts themselves. User management is one of the biggest pain points with independent nodes. Creating an account on one node does not propagate anywhere else. But is this approach secure?
A central YAML file defines users and permissions, and a single playbook ensures every node matches it. But how often should permissions be reviewed?
Managing multiple Proxmox nodes without clustering isn't a workaround or compromise. For many deployments, it's the better architecture. But what are the potential drawbacks?
Geographic distribution works without fighting Corosync latency limits, and hardware heterogeneity also becomes possible since nodes don't share a failure domain. But how will this affect performance?
A problem on one node stays isolated instead of threatening quorum. But what about disaster recovery and business continuity?