Trying to remove node from cluster - confused at the output of "pvecm delnode" command
I'm trying to remove a node from my cluster. I'm following the guide located here: https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node
Prior to doing anything, here is my output from "pvecm nodes" and "pvecm status" commands:
root@nodeA:/# pvecm nodes Membership information ---------------------- Nodeid Votes Name 1 1 nodeA (local) 2 1 nodeB 3 1 nodeC root@nodeA:/# pvecm status Cluster information ------------------- Name: proxmox-cluster Config Version: 11 Transport: knet Secure auth: on Quorum information ------------------ Date: Wed Sep 8 19:05:27 2021 Quorum provider: corosync_votequorum Nodes: 3 Node ID: 0x00000001 Ring ID: 1.214b Quorate: Yes Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 3 Quorum: 2 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 0x00000001 1 192.168.0.10 (local) 0x00000002 1 192.168.0.15 0x00000003 1 192.168.0.20
I'm trying to remove nodeC / 3 from the cluster, so I powered down the node first, and then ran "pvecm delnode nodeC". I expected to see a simple "Killing node 3" message, but I received the following error:
root@nodeA:/# pvecm delnode nodeC Killing node 3 Could not kill node (error = CS_ERR_NOT_EXIST) error during cfs-locked 'file-corosync_conf' operation: command 'corosync-cfgtool -k 3' failed: exit code 1
I tried running it a second time, because it mentioned some sort of lock issue, so I thought maybe some file was still locked, and received a different error:
root@nodeA:/# pvecm delnode nodeC error during cfs-locked 'file-corosync_conf' operation: Node/IP: nodeC is not a known host of the cluster.
So that looks like maybe it was successfully added? But I'm anxious about the error I received, because I will be eventually adding this node back to the cluster (after a full reinstall of proxmox), and don't want to have any inconsistent stat errors.
Here is what my "pvecm nodes" and "pvecm status" commands currently output:
root@nodeA:/# pvecm nodes Membership information ---------------------- Nodeid Votes Name 1 1 nodeA (local) 2 1 nodeB root@nodeA:/# pvecm status Cluster information ------------------- Name: proxmox-cluster Config Version: 12 Transport: knet Secure auth: on Quorum information ------------------ Date: Wed Sep 8 19:16:40 2021 Quorum provider: corosync_votequorum Nodes: 2 Node ID: 0x00000001 Ring ID: 1.214f Quorate: Yes Votequorum information ---------------------- Expected votes: 2 Highest expected: 2 Total votes: 2 Quorum: 2 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 0x00000001 1 192.168.0.10 (local) 0x00000002 1 192.168.0.15
Am I in a good clean state after this removal now? Anything else I should check first?
Comments Section
[SOLVED] Unable to properly remove node from cluster
try this thread on the Proxmox forums. You may need to run
rm -rf /etc/pve/nodes/nodeC
,rm -rf /etc/pve/priv/lock/ha_agent_nodeC_lock/
to finish up based on that thread.Awesome, thanks! I'll give that a shot when I'm back in the lab tomorrow.
Proxmox is a bit weird and this is my method I've developed when I need to do cluster work.
Each cluster node gets this bash file:
This code will clear out corosync settings from a node I remove. This resets the services for regular no-cluster usage, and I've used it multiple times with success. This is run on a node AFTER being removed from the cluster.
On the cluster side of things I usually run the usual "pvecm delnode $nodename" once, wait a bit for corosync to propagate, and then remove the reference from /etc/pve/nodes to remove it from the webui. That seems to usually propagate and within 30 minutes I can safely remove a node from a cluster.
Only on one occasion when I was new (and drinking) have I had to completely reconstruct a cluster, but it's not too bad if you absolutely have to.
At a glance at your corosync file, I think you are good, depending on if you want to do any further cleanup as I do above.
Edit: Apparently code block doesn't want more than one line tonight.
I followed other stuff in this thread, and noticed that ProxMox still thought I was part of an existing cluster under Datacenter>Cluster, so figured to give your code a try. Worked perfectly - Proxmox now doesn't think I'm part of a cluster and everything is good with the world again. Thanks
cleaned an aborted Node removal on the (unremoved) node, back to default with your commands! Thanks.
trying to acquire cfs lock 'file-corosync_conf' ...
Killing node 2
unable to open file '/etc/pve/corosync.conf.new.tmp.3086546' - Permission denied
At least here´s the error i got. Locally the other node was removed, in GUI everything was still borked. This fixed it.
You can't mess with the cluster when it is degraded.When degraded, the corosync file goes into read only.Power on all the nodes and follow the manual.https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node
Edit: Getting my shit mixed up.
I am following the manual though. The first thing the manual says to do is power off the node you are removing:
On a side note if you have one node of the two off, you might want to give a node two votes temporarily so you can still start containers and VMs.
Edit: Clarified.
Hmm, is that a HA thing? I'm not running in HA mode, and I've had no problems starting/stopping vm's in the past.
Edit: "In the past" meaning back when I ran it as a 2-node cluster.