Skip to main content Failed to join cluster, now I can't even log in web gui : r/Proxmox

Failed to join cluster, now I can't even log in web gui

Question

Hey there,

Let me preface this by saying that I'm very new to Proxmox. I have been playing around with it for a week or two or so.

Anyways, here's the situation. I have a Server1, I Proxmox server that I'm using to educate myself. Now I wanted to add a second server, let's call it Server2. I was planning on virtualizing UnRaid, and Proxmox Backup Server (purely to backup server 1). I created a cluster on Server1, setup the UnRaid VM on Server2, but then when I tried to join, I got the following error: "permission denied - invalid PVE ticket (401)." And I couldn't access the web GUI on Server2 either. (On a side-note, the VM's wouldn't start anymore on Server1, since something was wrong with the cluster. Server2 was still somehow added to its knowledge, but since it wasn't there, it went like "nope!" But just deleting Server2 node fixed that). I did a quick fresh install of Proxmox on Server2, and I tried joining the cluster again, but I'm getting same error, and again, I can't access the interface. I went and tried some stuff that I found on Proxmox forums, and some posts here:

# systemctl stop pve-cluster
# rm -f /var/lib/pve-cluster/.pmxcfs.lockfile
# systemctl start pve-cluster

# systemctl start pve-cluster

# pvecm updatecerts

# systemctl restart pvedaemon pveproxy

But I'm not making any progress. I came across this command as well: journalctl -xe and that gave the following, along with a few errors:

Nov 07 00:33:40 r720 pve-ha-lrm[1646]: unable to write lrm status file - unable to open file '/etc/pve/nodes/r720/lrm_status.tmp.1646' - No such file or directory
Nov 07 00:33:41 r720 pvestatd[1446]: authkey rotation error: cfs-lock 'authkey' error: pve cluster filesystem not online.
Nov 07 00:33:42 r720 pveproxy[3037]: worker exit
Nov 07 00:33:43 r720 pveproxy[1636]: worker 3037 finished
Nov 07 00:33:43 r720 pveproxy[1636]: starting 1 worker(s)
Nov 07 00:33:43 r720 pveproxy[1636]: worker 3061 started
Nov 07 00:33:43 r720 pveproxy[3038]: worker exit
Nov 07 00:33:43 r720 pveproxy[3039]: worker exit
Nov 07 00:33:43 r720 pveproxy[3061]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 2025.
Nov 07 00:33:43 r720 pveproxy[1636]: worker 3038 finished
Nov 07 00:33:43 r720 pveproxy[1636]: starting 1 worker(s)
Nov 07 00:33:43 r720 pveproxy[1636]: worker 3064 started
Nov 07 00:33:43 r720 pveproxy[1636]: worker 3039 finished
Nov 07 00:33:43 r720 pveproxy[1636]: starting 1 worker(s)
Nov 07 00:33:43 r720 pveproxy[1636]: worker 3065 started
Nov 07 00:33:43 r720 pveproxy[3064]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 2025.
Nov 07 00:33:43 r720 pveproxy[3065]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 2025.
Nov 07 00:33:45 r720 pve-ha-lrm[1646]: unable to write lrm status file - unable to open file '/etc/pve/nodes/r720/lrm_status.tmp.1646' - No such file or directory
Nov 07 00:33:48 r720 pveproxy[3061]: worker exit
Nov 07 00:33:48 r720 pveproxy[1636]: worker 3061 finished
Nov 07 00:33:48 r720 pveproxy[1636]: starting 1 worker(s)
Nov 07 00:33:48 r720 pveproxy[1636]: worker 3068 started
Nov 07 00:33:48 r720 pveproxy[3064]: worker exit
Nov 07 00:33:48 r720 pveproxy[3065]: worker exit
Nov 07 00:33:48 r720 pveproxy[3068]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 2025.
Nov 07 00:33:48 r720 pveproxy[1636]: worker 3064 finished
Nov 07 00:33:48 r720 pveproxy[1636]: starting 1 worker(s)
Nov 07 00:33:48 r720 pveproxy[1636]: worker 3069 started
Nov 07 00:33:48 r720 pveproxy[1636]: worker 3065 finished
Nov 07 00:33:48 r720 pveproxy[1636]: starting 1 worker(s)
Nov 07 00:33:48 r720 pveproxy[1636]: worker 3070 started
Nov 07 00:33:48 r720 pveproxy[3069]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 2025.
Nov 07 00:33:48 r720 pveproxy[3070]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 2025.
Nov 07 00:33:50 r720 pve-ha-lrm[1646]: unable to write lrm status file - unable to open file '/etc/pve/nodes/r720/lrm_status.tmp.1646' - No such file or directory
Nov 07 00:33:51 r720 pvestatd[1446]: authkey rotation error: cfs-lock 'authkey' error: pve cluster filesystem not online

But I'm not really sure what any of that means, let alone, know what to do about it.

Any help would be much appreciated.

Help desk tickets piling up? monday service automates grunt work, saving your sanity so you can focus on the impactful work. Click below to breathe easy!
Thumbnail image: Help desk tickets piling up? monday service automates grunt work, saving your sanity so you can focus on the impactful work. Click below to breathe easy!
Sort by:
Best
Open comment sort options
[deleted]

Comment deleted by user

Edited

Server1:

Name:             CasaFontaine
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Nov  7 01:21:22 2024
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.5
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.4.203 (local)

Server2:

root@r720:~# pvecm status
Cluster information
-------------------
Name:             CasaFontaine
Config Version:   4
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Nov  7 01:17:57 2024
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000002
Ring ID:          2.a
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 192.168.1.220 (local)
root@r720:~#

EDIT: I don't know what that 0x00000001 1 192.168.4.203 (local) is supposed to mean, but that's a direct SFP+ link to my TrueNAS server:

https://i.gyazo.com/8c6a1d7b039e8164c922ef78789f4433.png

EDIT2: I guess that's where the problem started:

https://i.gyazo.com/90fc1b9afbf9b3341499dba334d3cfbd.png

More replies