Skip to main content Trying to remove node from cluster - confused at the output of "pvecm delnode" command : r/Proxmox

Trying to remove node from cluster - confused at the output of "pvecm delnode" command

I'm trying to remove a node from my cluster. I'm following the guide located here: https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node

Prior to doing anything, here is my output from "pvecm nodes" and "pvecm status" commands:

root@nodeA:/# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 nodeA (local)
         2          1 nodeB
         3          1 nodeC

root@nodeA:/# pvecm status
Cluster information
-------------------
Name:             proxmox-cluster
Config Version:   11
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Sep  8 19:05:27 2021
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.214b
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.0.10 (local)
0x00000002          1 192.168.0.15
0x00000003          1 192.168.0.20

I'm trying to remove nodeC / 3 from the cluster, so I powered down the node first, and then ran "pvecm delnode nodeC". I expected to see a simple "Killing node 3" message, but I received the following error:

root@nodeA:/# pvecm delnode nodeC
Killing node 3
Could not kill node (error = CS_ERR_NOT_EXIST)
error during cfs-locked 'file-corosync_conf' operation: command 'corosync-cfgtool -k 3' failed: exit code 1

I tried running it a second time, because it mentioned some sort of lock issue, so I thought maybe some file was still locked, and received a different error:

root@nodeA:/# pvecm delnode nodeC
error during cfs-locked 'file-corosync_conf' operation: Node/IP: nodeC is not a known host of the cluster.

So that looks like maybe it was successfully added? But I'm anxious about the error I received, because I will be eventually adding this node back to the cluster (after a full reinstall of proxmox), and don't want to have any inconsistent stat errors.

Here is what my "pvecm nodes" and "pvecm status" commands currently output:

root@nodeA:/# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 nodeA (local)
         2          1 nodeB

root@nodeA:/# pvecm status
Cluster information
-------------------
Name:             proxmox-cluster
Config Version:   12
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Sep  8 19:16:40 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.214f
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.0.10 (local)
0x00000002          1 192.168.0.15

Am I in a good clean state after this removal now? Anything else I should check first?

Sort by:
Best
Open comment sort options

[SOLVED] Unable to properly remove node from cluster

try this thread on the Proxmox forums. You may need to run rm -rf /etc/pve/nodes/nodeC, rm -rf /etc/pve/priv/lock/ha_agent_nodeC_lock/ to finish up based on that thread.

Awesome, thanks! I'll give that a shot when I'm back in the lab tomorrow.

More replies

Proxmox is a bit weird and this is my method I've developed when I need to do cluster work.

Each cluster node gets this bash file:

#!/bin/bash
systemctl stop pve-cluster corosync
pmxcfs -l
rm -r /etc/corosync/*
rm -r /etc/pve/corosync.conf
killall pmxcfs
systemctl start pve-cluster

This code will clear out corosync settings from a node I remove. This resets the services for regular no-cluster usage, and I've used it multiple times with success. This is run on a node AFTER being removed from the cluster.

On the cluster side of things I usually run the usual "pvecm delnode $nodename" once, wait a bit for corosync to propagate, and then remove the reference from /etc/pve/nodes to remove it from the webui. That seems to usually propagate and within 30 minutes I can safely remove a node from a cluster.

Only on one occasion when I was new (and drinking) have I had to completely reconstruct a cluster, but it's not too bad if you absolutely have to.

At a glance at your corosync file, I think you are good, depending on if you want to do any further cleanup as I do above.

Edit: Apparently code block doesn't want more than one line tonight.

I followed other stuff in this thread, and noticed that ProxMox still thought I was part of an existing cluster under Datacenter>Cluster, so figured to give your code a try. Worked perfectly - Proxmox now doesn't think I'm part of a cluster and everything is good with the world again. Thanks

cleaned an aborted Node removal on the (unremoved) node, back to default with your commands! Thanks.

pvecm delnode LaptopPVE

trying to acquire cfs lock 'file-corosync_conf' ...

Killing node 2

unable to open file '/etc/pve/corosync.conf.new.tmp.3086546' - Permission denied

At least here´s the error i got. Locally the other node was removed, in GUI everything was still borked. This fixed it.

More replies
Edited

so I powered down the node first

You can't mess with the cluster when it is degraded.

error during cfs-locked 'file-corosync_conf'

When degraded, the corosync file goes into read only.

Power on all the nodes and follow the manual.

https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node

Edit: Getting my shit mixed up.

I am following the manual though. The first thing the manual says to do is power off the node you are removing:

At this point you must power off hp4 and make sure that it will not power on again (in the network) as it is.

More replies
More replies
Edited

On a side note if you have one node of the two off, you might want to give a node two votes temporarily so you can still start containers and VMs.

Edit: Clarified.

Hmm, is that a HA thing? I'm not running in HA mode, and I've had no problems starting/stopping vm's in the past.

Edit: "In the past" meaning back when I ran it as a 2-node cluster.

More replies
More replies