Building Redundant Storage¶
After going through a headache with a failed drive, I'm rebuilding my Proxmox lab's storage setup to be redundant.
When Proxmox was installed intially, we used LVM instead of ZFS.
If we had used ZFS for the root filesystem/boot drive, we could have leveraged the internal mirroring that it supports.
But, we didn't, so we're going to set up RAID1 mirroring for the LVM.
We did use ZFS for vmdata
, but that came later.
Setting up RAID1 for Root Filesystem¶
For the root filesystem (mounted at /
), we will use a RAID1 array.
The root filesystem on my server is mounted via LVM.
Find Your Own Root Filesystem
If you're going to be using this as a guide, make sure to check what your root filesystem is!
You'll see this type of output: For my case,/dev/mapper/pve-root
is the root filesystem, which is a
logical volume (LV) through LVM. The typical naming convention here is
VGname-LVname
, so pve-root
is the root
LV, belonging to the pve
volume
group (VG).
Find which physical volume is being used for that LV with pvs
.
/dev/sda3
.
Run these commands to find your own root filesystem and its corresponding disk drive.
Now that we've determined that our root files are managed in LVM, and identified which volume groups and physical volumes are being used, that makes migration pretty simple.
There are four(5) main steps here (ultra-simplified).{ .annotate }
- Add the new drive, create a degraded RAID1 array from it
- Add the RAID array to LVM
- Migrate all LVM data from old drive to new RAID array
- Remove old drive from LVM, then add it to the RAID array
- These are the conceptual steps, there are many more technical steps in this process.
There's a lot more to it than that, but that's the concept.
The disks that I'll be using on this page (for my case):
/dev/sda
is the main boot disk that contains the root filesystem./dev/sdc
is the disk that I'll be using for the backup.
High Level Overview¶
These are the basic steps I took to migrate my root filesystem, which was installed via the Proxmox VE installer with LVM, to a RAID1 array.
These steps would also work with any other Debian-based Linux system installation that was done with LVM. The names of the volume groups will change, but the same fundamental principles will work.
Creating a RAID1 Array for the Root Filesystem¶
This is the truncated version of the writeup. Contains just the basic commands used and brief explanations on what they're doing and the state of the migration.
Root Access Required
These command assume root access. Use sudo
if you're not on the root user.
-
Add new physical disk to server.
- In my case, I also needed to use passthrough mode for the disk by rebooting into BIOS and changing the device settings to "Convert to Non-RAID disk" (the hardware RAID controller in my server would prevent the OS from using it).
-
Copy the boot drive's disk partition table to the new drive.
sgdisk --backup=table.sda /dev/sda sgdisk --load-backup=table.sda /dev/sdc sgdisk --randomize-guids /dev/sdc # To make the GUIDs unique
- This mirrors the boot drive's partitions so we don't need to create them manually.
-
Create a degraded RAID1 array using the new disk.
- Degraded, in this context, means leaving the second disk out of the array ("
missing
"). -
/dev/sda3
is where the root filesystem LVM lives, so we use/dev/sdc3
(the backup disk's equivalent). -
Confirm it's been created
We should see ourmd1
array here with[U_]
(one of two devices):
- Degraded, in this context, means leaving the second disk out of the array ("
-
Add the RAID device to LVM.
- This adds the RAID array as a physical volume (PV) to be used by LVM.
- The
pvs
output should look something like this:
-
Add the array to the
pve
Volume Group.
-
We can transfer all the data from the
/dev/sda
disk to the/dev/md1
RAID array now that they're in the same volume group. -
Confirm it's been added.
Check the
#PV
column. It should be2
.
-
-
Live-migrate all LVs from the old PV (
/dev/sda3
) onto the array.
-
This step will take a while. It will migrate all of the data on
/dev/sda3
to/dev/md1
.- You can keep an eye on the elapsed time with this:
-
Once it's done, make sure to confirm the migration succeeded.
-
Check the free space for the old physical volume.
Notice thePFree
column in relation toPSize
.
ThePV VG Fmt Attr PSize PFree /dev/md1 pve lvm2 a-- 222.44g <15.88g /dev/sda3 pve lvm2 a-- <222.57g <222.57g
/dev/sda3
PV now is 100% free.
-
-
Remove the original (
Output you should get:/dev/sda3
) from thepve
volume group.
- At this point, LVM only uses
/dev/md1
. The LV root now physically lives in the RAID array.
- At this point, LVM only uses
-
Save the RAID configuration.
- These make sure that the array is assembled during early boot.
update-initramfs -u
might bedracut --regenerate-all --force
on some RedHat distros. We're on Proxmox, though, so that's not relevant here, but should be noted.
-
Verify before reboot.
You should see:- This confirms the LV for
/
is built on top of/dev/md1
(in/dev/sdc3
) instead of/dev/sda3
.
- This confirms the LV for
-
Reboot the system.
Check that the root filesystem is mounted with the RAID device.
-
Once we've comfirmed that everything boots correctly and the RAID array is healthy, we can finally add the old boot disk to the mirror.
Then confirm: Wait until it finishes syncinc. When we see
[UU]
, it means both disks are in sync.
When the RAID entry looks like this, it's complete.Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 sda3[2] sdc3[0] 233249344 blocks super 1.2 [2/2] [UU] bitmap: 1/2 pages [4KB], 65536KB chunk unused devices: <none>
- We can even verify that it's mirrored by looking at
lsblk
. The/dev/sda3
and/dev/sdc3
entries should look identical.
The only difference is the disk name.
- We can even verify that it's mirrored by looking at
Setting up Redundant Boot¶
This is optional, if your boot drive fails and you've already set up the RAID array, your data is safe. But, you may need to boot from a recovery image in that situation if you don't set up redundant boot.
We're essentially going to mirror the ESP (EFI System Partition) on the
original boot disk (partition /dev/sda2
) to the new backup disk, but not with
RAID.
This part assumes you've already done the steps above to migrate the root LVM into a RAID array.
-
Identify the boot partitions to turn into ESPs. The
/dev/sda2
partition on my machine is mounted to/boot/efi
, so/dev/sdc2
will be my backup (since the partition table is mirrored). -
Format the redundant boot partition as FAT32. It'll be
Verify that the filesystem was created. The/dev/sdc2
for me.
FSVER
column should be set toFAT32
. -
Mount the new ESP . Just a temp location to sync the boot files.
Verify that it's mounted. -
Sync the EFI contents from the original boot partition.
-
Include the trailing slashes!
Make sure you include the trailing slashes in the directory names. If you don't, it will copy theefi
directory itself, causing the ESP filesystem hierarchy to be incorrect. -
Verify that both directories have the same contents.
No output on thediff
is a good thing.
-
-
Install GRUB on both paritions.
sudo grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=proxmox --recheck sudo grub-install --target=x86_64-efi --efi-directory=/boot/efi2 --bootloader-id=proxmox --recheck
- You can use the same
bootloader-id
for both entires. If you want it to be more specific (to be able to boot from a specific disk), give the two disks distinctbootloader-id
values.
- You can use the same
-
Rebuild the GRUB menu.
-
Add firmware boot entries for both drives to make sure UEFI firmware can boot from either disk.
sudo efibootmgr -c -d /dev/sda -p 2 -L "Proxmox (sda2)" -l '\EFI\proxmox\shimx64.efi' sudo efibootmgr -c -d /dev/sdc -p 2 -L "Proxmox (sdc2)" -l '\EFI\proxmox\shimx64.efi'
-
Choosing the Right Loader
In this case, the OS loader isshimx64.efi
.
This bootloader is primarily for when Secure Boot is enabled in BIOS (UEFI).
If you do not have secure boot enabled, you can usegrubx64.efi
instead ofshimx64.efi
.shimx64.efi
: a Microsoft-signed first-stage loader that validates and then chains to GRUB. Required when Secure Boot = ON.grubx64.efi
: GRUB directly. Works when Secure Boot = OFF (or on systems allowing it).
I choseshimx64.efi
since that was what the default loader used by Proxmox.
There's really no drawback to usingshimx64.efi
, it's more compatible because it works whether Secure Boot is enabled or not.
-
If you need to delete one of the boot entries that you made (you made a mistake and you need to re-do it [this definitely didn't happen to me]), first identify the number that was assigned to it (
Then, if you wanted to delete theBoot000X
) with:Boot0006
entry, you'd specify-b 6
along with-B
.
-
Verify with:
You should see them as: Numbers may vary. -
We can also add removable fallback bootloaders to act as a safety net if NVRAM (Non-Volatile RAM) entries are ever lost (this part is optional).
This writes a standard fallback loader tosudo grub-install --target=x86_64-efi --efi-directory=/boot/efi --removable sudo grub-install --target=x86_64-efi --efi-directory=/boot/efi2 --removable
\EFI\BOOT\BOOTX64.EFI
.
It guarantees the machine will still boot even if all EFI entries vanish, so it's definitely a good thing to do.
-
-
Check that both ESPs exist and are populated.
Then check boot entries again. -
Test by rebooting. If we really want to test failover, we can test by disconnecting one disk at a time.
That should be it for setting up both a redundant root filesystem with RAID and mirroring the EFI System Partition.
Steps and Explanations¶
This is a little more long-form than the steps written above, but the process is the same.
Backup and Mirror Partition Table¶
Create a mirror of the main boot drive's partition table on the new backup drive.
Here we'll use sgdisk
to save the GUID partition table.
This will create the same partitions on the backup disk that the main boot disk
has, essentially mirroring the drive's partition layout. All partitions on the
new disk will be the same size as the original.
sudo sgdisk --backup=table.sda /dev/sda
sudo sgdisk --load-backup=table.sda /dev/sdc
sudo sgdisk --randomize-guids /dev/sdc
-
sgdisk
: Command-line GUID partition table (GPT) manipulator for Linux/Unix -
--backup=table.sda /dev/sda
: Save/dev/sda
partition data to a backup file.- The partition backup file is
table.sda
.
- The partition backup file is
--load-backup=table.sda /dev/sdc
: Load the/dev/sda
partition data from the backup file.-
--randomize-guids /dev/sdc
:-
Randomize the disk's GUID and all partitions' unique GUIDs (but not their
partition type code GUIDs). -
Used after cloning a disk in order to render all GUIDs unique once again.
-
Check your own disks!
These are the disks that I used.
/dev/sda
is my main boot drive, it may not be your main boot drive.
/dev/sdc
is my backup drive, it may not be your backup drive.
If you're using these notes as a guide, check which disks you need to be using on your machine.
Create RAID1 Array for Root Filesystem¶
We'll need to use one disk partition for this, cloned from the original boot disk.
/
(root): All the data in the root filesystem, will hold the LVM
If we were using BIOS (Legacy boot), we'd need one more for /boot
.
-
/boot
: Not needed on UEFI boot systems.- This is only for systems that use BIOS/Legacy Boot mode.
-
If
/dev/sda2
is UEFI (mounted at/boot/efi
) then we do NOT create a RAID array for it. -
UEFI firmware does not play well with
md
metadata, so it's backed up in a different way.
Degraded RAID Array
We need to create the array degraded first, using only the new disk,
so we don't touch the live boot disk. We do this by providing missing
in
place of the live partition.
If we use the live disk when creating the array, they'll be clobbered first.
This will cause data loss.
The new, empty disk is /dev/sdc
in this example. Don't put in the /dev/sda3
partition, or it will be wiped.
Create the RAID array (degraded) with the new backup disk's /dev/sdc3
partition.
What about /dev/sda1
?
The /dev/sda1
partition is the BIOS boot partition, used by GRUB when the
disk uses GPT partitioning on a system booting in legacy BIOS (not UEFI).
This partition is a tiny raw area that GRUB uses to stash part of its own
boot code. It has no filesystem, no files, no mountpoint.
In UEFI boot systems, /dev/sda1
will contain BIOS boot instructions for
backwards compatibility, even if the system is in UEFI mode. Make sure to
check it.
Check to make sure the new array exists and is healthy.
The [U_]
indicates that one out of two drives is present in the array and it
is healthy.
Migrate the Root Filesystem LVM to RAID1¶
-
Use
This addspvcreate
on the array and extend the VG.
/dev/md1
to LVM as a physical volume. -
Add the array to the
This is the volume group that all of the system's LVs are build on top of.pve
Volume Group.
-
Live-migrate all LVs from the old PV (
/dev/sda3
) onto the array !!! info inline end This will take a while. In my case, it took roughly an hour and a half.
- You can keep an eye on the elapsed time with this:
-
Remove
/dev/sda3
from the volume group.
This should have migrated all of the data on /dev/sda3
to /dev/md1
.
Save RAID Config¶
-
mdadm --detail --scan
: Scans all active RAID arrays (/dev/md*
) and prints their config.- Then we save those RAID array configs to
/etc/mdadm/mdadm.conf
to make them persistent/permanent and they'll auto-mount on boot.
- Then we save those RAID array configs to
-
update-initramfs -u
: Rebuilds theinitramfs
and makes sure RAID modules andmdadm.conf
are baked in.-
The
initramfs
is the small Linux image the kernel loads before the root filesystem is ready. -
It needs to include the
mdadm
tools and RAID metadata so it can assemble RAID arrays before mounting/
. -
Without doing this, the system might fail to boot because the RAID wouldn't be assembled early enough.
-
If you skip this step and your system boots fine, it's possible your RAID array may be given a random name (e.g.,
md127
). If that happens, you can simply run the command and reboot to get the name you gave it.
-
Reboot and Confirm¶
Reboot the machine.
/
is on md1
.Check for this section:
└─sdc3 8:35 0 222.6G 0 part
└─md1 9:1 0 222.4G 0 raid1
├─pve-swap 252:0 0 8G 0 lvm [SWAP]
├─pve-root 252:1 0 65.6G 0 lvm /
...
pve-root
, mounted on /
, lives on the
new md1
RAID array.
Finish the Mirror¶
Add the old partition (/dev/sda3
) to the array now.
Keep checking the /proc/mdstat
file for UU
.
Look for the entry related to the new RAID array:
Personalities : [raid1]
md1 : active raid1 sda3[0] sdb3[1]
234376192 blocks super 1.2 [2/2] [UU]
unused devices: <none>
The [UU]
at the end means the mirror is healthy and synchronized.
UEFI ESP Boot Redundancy (No mdadm
)¶
This is how I set up redundancy for the OS boot itself.
This is technically optional if all you want is data backup for your root filesystem, but I wanted to be able to boot from the backup drive if one fails rather than using a bootable USB to boot into recovery mode.
This assumes you have already cloned your boot disk's partition table.
We don't use mdadm
RAID for UEFI boot data because the EFI firmware does not
play well with md
metadata.
We will mirror the original boot disk's ESP to the new disk.
What is an ESP?
ESP stands for EFI System Partition. It's the FAT32 partition (in this
layout, it's /dev/sda2) that UEFI firmware reads at boot time to find
bootloader files (e.g.,
/EFI/BOOT/BOOT64.EFIor
/EFI/proxmox/grubx64.efi`)
Make a New FAT32 Filesystem¶
First we make a FAT32 filesystem to mirror the one one the original boot disk.
Mount the New Filesystem¶
After we've created the FAT32 filesystem, we need to create a mountpoint and
mount it so that we can copy over the files we need.
efi2
, but that's what I chose
for clarity.
Mirror the ESP¶
Use rsync
to create a full mirror of the /boot/efi
directory into
/boot/efi2
.
Include trailing slashes!
When specifying the directories in the rsync
command, you must
include the trailing slashes to create an actual mirror.
If you ran this without the trailing slashes, it would copy the /boot/efi
directory itself, which is not what we want. We only want the
contents of that directory, which is what the trailing slash tells
rsync
.
Once that's done, move onto the next step.
Install GRUB on Both ESPs¶
Now that the files are there, we can use grub-install
to install the
bootloader onto the partitions.
sudo grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=proxmox --recheck
sudo grub-install --target=x86_64-efi --efi-directory=/boot/efi2 --bootloader-id=proxmox2 --recheck
Make sure to run update-grub
afterwards. This will generate a new GRUB
configuration file.
sudo update-grub
# or, if that's not available for some reason
sudo grub-mkconfig -o /boot/grub/grub.cfg
The update-grub
command is a wrapper utility for grub-mkconfig
, so use whichever
floats ya boat.
Set up EFI Boot Manager¶
We'll need to add some new entries into the EFI boot manager specifying our
desired bootloader.
sudo efibootmgr -c -d /dev/sda -p 2 -L "Proxmox (sda2)" -l '\EFI\proxmox\shimx64.efi'
sudo efibootmgr -c -d /dev/sdc -p 2 -L "Proxmox (sdc2)" -l '\EFI\proxmox\shimx64.efi'
-
efibootmgr
: This is the userspace application used to modify the UEFI b oot manager. It can create, destroy, and modify boot entries.-c
: Create a new boot entry.-d /dev/sdX
: Specify the disk we're using.-p 2
: Specify the boot partition on the disk.-L "Proxmox (sdX2)"
: The label to give to the boot entry.-l '\EFI\proxmox\shimx64.efi'
: Specify the bootloader to use.
Choosing the Right Bootloader
In this case, the OS loader is shimx64.efi
.
This bootloader is primarily for when Secure Boot is enabled in BIOS (UEFI).
If you do not have secure boot enabled, you can use grubx64.efi
instead of shimx64.efi
.
shimx64.efi
: a Microsoft-signed first-stage loader that validates and then chains to GRUB. Required when Secure Boot = ON.grubx64.efi
: GRUB directly. Works when Secure Boot = OFF (or on systems allowing it).
I choseshimx64.efi
since that was what the default loader used by Proxmox.
There's really no drawback to usingshimx64.efi
, it's more compatible because it works whether Secure Boot is enabled or not.
Deleting an Entry
If you need to delete one of the boot entries that you made (you made
a mistake and you need to re-do it [this definitely didn't happen to me]),
first identify the number that was assigned to it (Boot000X
) with:
Boot0006
entry, you'd specify -b 6
along with -B
.Once you've created the boot entries, verify that they're there and using the
bootloader that you specified.
Add Fallback Loaders (Optional)¶
We can also add removable fallback bootloaders to act as a safety net if
NVRAM (Non-Volatile RAM) entries are ever lost (this part is optional).
sudo grub-install --target=x86_64-efi --efi-directory=/boot/efi --removable
sudo grub-install --target=x86_64-efi --efi-directory=/boot/efi2 --removable
\EFI\BOOT\BOOTX64.EFI
.It guarantees the machine will still boot even if all EFI entries vanish, so it's definitely a good thing to do.
Changing Boot Order¶
The boot order should be appropriately updated after doing all these steps, but
if you want to manually change it, it can be done with the -o
option.
Identify the numbers associated with the boot entries.
Boot####
numbers in the command (comma-delimited).Reboot¶
Reboot to make sure you're booting from the new boot entries.
Delete Old Boot Entry (Optional)¶
Once all that's done, you should be able to safely delete the old boot entry.
Make sure you did it right!
This step is destructive. If you delete the original boot entry and your new ones are formatted incorrectly, it may cause you to be locked out of your system. Verify that your bootloader entries are correct before doing this.
You may want to test first by changing the boot order to boot using the new entires.
First, verify the number associated with the original boot entry.
Find the original entry. Mine was simply labeled "proxmox
", and it was
Boot0005
(number 5
).
Replacing a Failed Boot Disk (DR)¶
If /dev/sda
fails, we'll still have the backup to boot from. But we'll want
to replace the disk with a new one to maintain redundancy.
-
Replace it with a new disk of equal or larger size.
-
Clone the partition table:
-
Add its partitions back into the arrays:
-
Recreate and sync the EFI partition:
-
Wait for
md
to show[UU]
.
ZFS Rebuild (for vmdata
)¶
The vmdata
ZFS pool was set up after the original OS installation.
The failed drive affected this ZFS pool.
This approach uses ZFS internal mirroring rather than traditional software RAID
through mdadm
.
Although, once we can afford a new disk, we will set up software RAID1 (mirror) on the main PVE boot drive.
Wipe and rebuild
Create new, mirrored pool.
Add to Proxmox, then verify.
Troubleshooting¶
If GRUB fails to find the root device, we can modify /etc/default/grub
to
preload RAID by adding mdraid1x
to GRUB_PRELOAD_MODULES
.
If the RAID root doesn't boot, we can use a rescue/live CD to re-check
/etc/fstab
(shouldn't be an issue with LVM) and bootloader settings.