Proxmox – ZFS – Dead drive on active VM, recover from replicated disk
So you run a proxmox cluster, and you have 3 nodes.
Your VMs are all replicated, using ZFS replication, and in HA mode.
Each ZFS volume runs on a single drive, because we don’t have too much money, and it’s an home setup, OR, your RAID back end went nuts and you lost a full physical volume.
Issue is, your VM did not migrate through HA to another node, because obviously, the PVE node was not down, as only 1 of the ZFS drives died.
Then, I had major power outage, and a second one, ending after auto reboot, with some VMs on one node, some on another, and, this specific VM (my nextcloud), with a 1TB virtual disk, that was on the dead physical disk.
At this stage, I have a failed VM, on an active node, without valid disk for the VM, BUT, thanks to the cluster, I still see with « zfs list » on the other nodes, that the replicated copies are still here !
So here are the steps, finally easier and faster than expected :
Move the VM manually in another node of the cluster, and this is very simple, as SSH with a shell on any PVE nodes :
mv /etc/pve/nodes/pve1/qemu-server/106.conf /etc/pve/nodes/pve6/qemu-server/
Now, you did it, refresh the web interface, and the VM is now back on the node that has a valid disk for your VM ! …but it won’t start because it was in failed state at HA level. Not an issue, on the VM, menu « more », manage HA, and set it to « disabled », validate. Then again, in the menu, set it to « started ».
VM should start again, using the replica of the disk !
Now, on the node that had the failed disk, I have SATA drives, I just hot unplugged the dead unit, and put a new disk.
Initiate the new disk as GPT, and we’ll need to add it to ZFS. Issue is, volume already exists. So we need to go to « datacenter » then « storage », double click on the needed ZFS Pool, and uncheck the node that had the failed drive.
Validate, and go add the disk on the node, in ZFS, with the same ZFS Pool name.
Now you have valid new ZFS volume.
Make sure your run the replication from the currently running node, to this empty volume !!! So has you have the data replicated again.
I was a bit stressed, even if I have backup on another machine at filesystem level with backuppc of my VM content, it’s always much much much more easy and quick to get the existing VM back on track.
I hope it will help you. I realized at least how easy and simple it is to move a VM that is not running from one host to another.
Have a great day, and go to hell power outage and dead drives ! (Yes I have UPS, surge protection… still).
2 Commentaires to Proxmox – ZFS – Dead drive on active VM, recover from replicated disk
Ajouter un commentaire
Links
Calendrier
L | M | M | J | V | S | D |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 |
Recherche
Derniers articles
Tresronours Twitter
Keywords cloud topic
Membre de la FSF
Liens qui vont bien
Mots clés vrac – keyword cloud
License du contenu – CC By NC SA
Archives
- Resumed posting and expanding on X
- Linkedin Access to your account has been restricted – Final debrief and resilience plan
- I’m thankful for the support I get in rough time
- Cyber security news of the day – 2024 May 31
- Alexandre Blanc Cyber Kicked out from Linkedin
- You’ll most likely find me on LinkedIn
- The Russian roulette landing page !
- RTSP, Debian, VLC, not playing, IP Camera
- 5G network hosted in the cloud, no internet, no phone ! So smart ! And I ended on TV, This week in cyber
- They lock the door for privacy… but they keep a copy of the key, and couple of backdoors
- Worst is yet to come, but they all warned you
- Migrating an old WordPress and handling character set, UTF8, latin1, latin1_swedish_ci
- From a broken TLS CA, to Facebook, to FIN12 hit and run
- Yes we can fix this mess, but do we want to ? That’s another story
- Criminals are still dominating the game, why are we doing so wrong, and what can we learn in this tech ocean ?
- Riding cloud can be tricky, don’t fall from it, in the weekly cyber !
- The threat landscape is very dynamic – Cyber news this week
- Cybersecurity is not obvious even for this newsletter !
- Install Slack desktop app on Kali rolling fixing libappindicator3-1 missing dependency
- How to delete all resources in azure to avoid charges after trial on your forced credit card registration
- Proxmox – ZFS – Dead drive on active VM, recover from replicated disk
- Restrict access to proxmox web admin interface
- Migrate your ESXI VMs to proxmox ZFS
- Install your VPN server with pi-hole on OVH VPS in 30 min
- Using raspberry pi 3 as wifi bridge and repeater and firewall
- Raspberry 3 – create a wifi repeater with USB wifi dongle
- raspberry 3 – routeur pare feu point d’acces wifi avec filtrage pub et tracking – router firewall access point with ads and tracking filtering
- Dell XPS 13 touchpad – corriger la sensibilité
- Utiliser Zazeen set top box depuis une connexion videotron
- Fermeture de mon compte facebook – la dernière goutte
- Choisir un kernel par defaut au demarrage de Centos 7.2 – configuration grub2
- Openvpn access server 2.0.25 et android
- Régler la luminosité du laptop par ligne de commande
- chromium outlook web app version complete sous linux
- Nexus 7 2012 – android 5 lollipop solution au probleme de lenteur
- HDD led sur Xubuntu – xfce
- xubuntu 14.04 verrouiller ecran de veille et desactiver mise en veille a la fermeture de l’ecran
- Authentification avec Radmin en utilisant Wine sur Gentoo
- Patcher bash sur une distribution plus supportee comme fedora 11
- Zimbra desktop sous xubuntu 14.04 64bit – fix
- xubuntu 12.10 probleme de son avec VLC – pulse audio – alsa – toshiba L855D – solution
- Evolution sous xubuntu 12.10 – bug affichage a la configuration – solution temporaire
- Booster son acces internet en changeant de DNS pour opendns
- Serveur DLNA sous ubuntu – minidlna
- sshfs sous windows – dokan sshfs
- xubuntu 11.10 Installer le plugin java pour firefox
- Installer Google Earth sur Xubuntu 11.10
- Installer nagios sur Fedora 11 depuis les sources
- Configurer varnish-cache avec des virtualhosts, apache, fedora, redhat, centos
- Installer Varnish depuis les sources sur Fedora 11
Hi, you saved my life today !
Same issue, without HA but scheduled replication.
Just move the VM on the healthy node, will connect automatically to the replicated disk, which was not crashed. Thks !
My pleasure, it’s always great to know that some shared knowledge helped someone else !
These are also memos for myself, so I know where to look if this happens again :P