Technologie
Proxmox – ZFS – Dead drive on active VM, recover from replicated disk
So you run a proxmox cluster, and you have 3 nodes.
Your VMs are all replicated, using ZFS replication, and in HA mode.
Each ZFS volume runs on a single drive, because we don’t have too much money, and it’s an home setup, OR, your RAID back end went nuts and you lost a full physical volume.
Issue is, your VM did not migrate through HA to another node, because obviously, the PVE node was not down, as only 1 of the ZFS drives died.
Then, I had major power outage, and a second one, ending after auto reboot, with some VMs on one node, some on another, and, this specific VM (my nextcloud), with a 1TB virtual disk, that was on the dead physical disk.
At this stage, I have a failed VM, on an active node, without valid disk for the VM, BUT, thanks to the cluster, I still see with « zfs list » on the other nodes, that the replicated copies are still here !
So here are the steps, finally easier and faster than expected :
Move the VM manually in another node of the cluster, and this is very simple, as SSH with a shell on any PVE nodes :
mv /etc/pve/nodes/pve1/qemu-server/106.conf /etc/pve/nodes/pve6/qemu-server/
Now, you did it, refresh the web interface, and the VM is now back on the node that has a valid disk for your VM ! …but it won’t start because it was in failed state at HA level. Not an issue, on the VM, menu « more », manage HA, and set it to « disabled », validate. Then again, in the menu, set it to « started ».
VM should start again, using the replica of the disk !
Now, on the node that had the failed disk, I have SATA drives, I just hot unplugged the dead unit, and put a new disk.
Initiate the new disk as GPT, and we’ll need to add it to ZFS. Issue is, volume already exists. So we need to go to « datacenter » then « storage », double click on the needed ZFS Pool, and uncheck the node that had the failed drive.
Validate, and go add the disk on the node, in ZFS, with the same ZFS Pool name.
Now you have valid new ZFS volume.
Make sure your run the replication from the currently running node, to this empty volume !!! So has you have the data replicated again.
I was a bit stressed, even if I have backup on another machine at filesystem level with backuppc of my VM content, it’s always much much much more easy and quick to get the existing VM back on track.
I hope it will help you. I realized at least how easy and simple it is to move a VM that is not running from one host to another.
Have a great day, and go to hell power outage and dead drives ! (Yes I have UPS, surge protection… still).
Restrict access to proxmox web admin interface
When you install proxmox, you have by default access to your admin interface, usually using the following URL :
https://IP-OF-YOUR-INSTANCE:8006
This is totally fine when using an internal instance, but quite another story if you rent an online server with proxmox on it. We don’t want to let this access wide open.
While proxmox comes with a nice firewall, it doesn’t allow you to restrict access to port 8006 and 22 (obviously not to lock you out), because it won’t do this on your « local » network, but when you are on a publicly connected machine, the « local » network is internet.
Therefore, a way to restrict this access is by creating the following file /etc/default/pveproxy with the following content :
ALLOW_FROM="127.0.0.1,172.20.0.1"
DENY_FROM="all"
POLICY="allow"
Be sure to use lower case. Then restart the pveproxy service :
service pveproxy restart
And from now on, only the listed IPs have access to the admin interface.
Reference : https://vdillenschneider.fr/architecture-de-services-avec-proxmox-sur-un-serveur-kimsufi
Migrate your ESXI VMs to proxmox ZFS
Lately, I migrated my personal lab from ESXI to proxmox hypervisor.
Many reasons are behind this move :
– using vmware esxi free did not allow me proper HA or replication
– each update was painful, and got some CPU no longer supported warnings
– not free open source etc.
– saw proxmox running at other place and was looking good
– the need to learn something new, and to have HA at my home lab as I migrated my workloads from VPS to home (due to OVH suspending an old offer and not allowing to migrate, but rather requiring full re install, in short delays…. tired of not controlling anything, I wanted to move my stack home, and have some redundancy).
Basically, this operation was from the following :
ESXI 6.7 latest update available in 2020 April (aside of the new major 7) to Proxmox PVE 6.1.8
This assume you have an ESXI and proxmox up and running that can reach each other and that you have NO SNAPSHOT on vmware esxi.
I also assume you have a ZFS volume mounted in your proxmox.
On the proxmox node (pve), from the shell, I install SSHFS :
#install SSHFS to mount esxi volumeapt install sshfs
#creating mount pointmkdir /mnt/ssh
#Mount root directory of esxi on /mnt/ssh of pve node
sshfs root@esxi:/ /mnt/ssh
#Convert flat VMDK drive to raw image (reading on esxi via SSHFS and output on your local proxmox)
#The below command must run from a drive where you have enough space to store the full size VMDK (usually /volume-you-created on your proxmox)qemu-img convert /mnt/ssh/vmfs/volumes/<yourdatastore>/<yourvmname>/<yourvmname>-flat.vmkd -O raw <yourvmname>.raw
#Create the target VM in proxmox with the same specs as ESXI, and making sure you pick ZFS volume as storage.
#identify target disk of the created VM in /dev/zvol/<volumeName>/<diskName>
#Once you know where is you new VM (usually a number associated with the VM) volume is you can dump the raw image to the new virtual diskdd bs=1M if=<yourVMname>.raw of=/dev/zvol/<volumeName>/<diskName>
If your machine is loaded, the command above may stale your ZFS and overload the server ( load can be way above amount of availalbe threads, due to default ZFS settings no limiting amount of threads).
A workaround found here actually helped getting around the issue using the option oflag=directdd bs=1M oflag=direct if=<yourVMname>.raw of=/dev/zvol/<volumeName>/<diskName>
Hope this helps, when migrating big VMDKs over 1TB, I had ZFS crashes due to this, command above allows to import them properly.
#When done, start the new VM and see if it boots. The main issue I had either on CentOs or Debian, was that network interface name changed. So at first boot, VM has no network.
#Not a big issue, edit the interface name in /etc/network/interface (debian) or /etc/sysconfig/network-scripts/ifcfg-ethx (CentOs)
#The MAC address changed, unless you forced it while creating the VM.
#As soon as your VM is up, delete the .raw image from your drive to free up space.
This is how I migrated 8 VMs, from ESXI to Proxmox without any issues.
Additional notes following this migration (2020 April 27) :
After migrating my last VM, I had a weird error stating I did not have enough space to replicate, as the system could not take any snapshot.
This is due to a default setting in ZFS, in Proxmox, where by default, a space reservation on the volume is made for snapshots etc, with the same size of the actual disk size.
While fine for most of the case, when you have a VM with 1.66TB drive, it starts to be an issue.
There is more explanation here about ZFS refreservation : http://www.mceith.com/blog/?p=153
Basically, in my case, the refreservation was as big at 1.66TB, not allowing snapshots to be taken, and therefore, not allowing VM replication.
You can see the information for a volume with the following command :
zfs get refquota,reservation,refreservation yourvolume/your-vm-disk
So basically, for my VM that had a refreservation default value of 1.66TB, I actually did set it to 500G, since my drive only had 578G of unassigned space.
This allowed the replication process to go on :
Hopefully this will help some, facing the disk full issue where disk is not actually full but reserved.
Links
Calendrier
L | M | M | J | V | S | D |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 | 29 |
30 | 31 |
Recherche
Derniers articles
Tresronours Twitter
Keywords cloud topic
Membre de la FSF
Liens qui vont bien
Mots clés vrac – keyword cloud
License du contenu – CC By NC SA
Archives
- Resumed posting and expanding on X
- Linkedin Access to your account has been restricted – Final debrief and resilience plan
- I’m thankful for the support I get in rough time
- Cyber security news of the day – 2024 May 31
- Alexandre Blanc Cyber Kicked out from Linkedin
- You’ll most likely find me on LinkedIn
- The Russian roulette landing page !
- RTSP, Debian, VLC, not playing, IP Camera
- 5G network hosted in the cloud, no internet, no phone ! So smart ! And I ended on TV, This week in cyber
- They lock the door for privacy… but they keep a copy of the key, and couple of backdoors
- Worst is yet to come, but they all warned you
- Migrating an old WordPress and handling character set, UTF8, latin1, latin1_swedish_ci
- From a broken TLS CA, to Facebook, to FIN12 hit and run
- Yes we can fix this mess, but do we want to ? That’s another story
- Criminals are still dominating the game, why are we doing so wrong, and what can we learn in this tech ocean ?
- Riding cloud can be tricky, don’t fall from it, in the weekly cyber !
- The threat landscape is very dynamic – Cyber news this week
- Cybersecurity is not obvious even for this newsletter !
- Install Slack desktop app on Kali rolling fixing libappindicator3-1 missing dependency
- How to delete all resources in azure to avoid charges after trial on your forced credit card registration
- Proxmox – ZFS – Dead drive on active VM, recover from replicated disk
- Restrict access to proxmox web admin interface
- Migrate your ESXI VMs to proxmox ZFS
- Install your VPN server with pi-hole on OVH VPS in 30 min
- Using raspberry pi 3 as wifi bridge and repeater and firewall
- Raspberry 3 – create a wifi repeater with USB wifi dongle
- raspberry 3 – routeur pare feu point d’acces wifi avec filtrage pub et tracking – router firewall access point with ads and tracking filtering
- Dell XPS 13 touchpad – corriger la sensibilité
- Utiliser Zazeen set top box depuis une connexion videotron
- Fermeture de mon compte facebook – la dernière goutte
- Choisir un kernel par defaut au demarrage de Centos 7.2 – configuration grub2
- Openvpn access server 2.0.25 et android
- Régler la luminosité du laptop par ligne de commande
- chromium outlook web app version complete sous linux
- Nexus 7 2012 – android 5 lollipop solution au probleme de lenteur
- HDD led sur Xubuntu – xfce
- xubuntu 14.04 verrouiller ecran de veille et desactiver mise en veille a la fermeture de l’ecran
- Authentification avec Radmin en utilisant Wine sur Gentoo
- Patcher bash sur une distribution plus supportee comme fedora 11
- Zimbra desktop sous xubuntu 14.04 64bit – fix
- xubuntu 12.10 probleme de son avec VLC – pulse audio – alsa – toshiba L855D – solution
- Evolution sous xubuntu 12.10 – bug affichage a la configuration – solution temporaire
- Booster son acces internet en changeant de DNS pour opendns
- Serveur DLNA sous ubuntu – minidlna
- sshfs sous windows – dokan sshfs
- xubuntu 11.10 Installer le plugin java pour firefox
- Installer Google Earth sur Xubuntu 11.10
- Installer nagios sur Fedora 11 depuis les sources
- Configurer varnish-cache avec des virtualhosts, apache, fedora, redhat, centos
- Installer Varnish depuis les sources sur Fedora 11