ZFS Failed Disk Replacement

From RoseWiki
Revision as of 21:27, 12 July 2025 by Maeve (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Copy partitions from good disk sda to blank disk sdb

sgdisk /dev/sda -R /dev/sdb	# sgdisk  /dev/sda<From this disk> -R /dev/sdb<Replicate to this disk>
sgdisk -G /dev/sdb # randomize the GUID on the new disk since it was copied from the other drive.

Using Parted to verify the partition table of /dev/sdb

(parted) select /dev/sdb
Using /dev/sdb
(parted) p
Model: ATA WDC WD2000FYYZ-0 (scsi)
Disk /dev/sdb: 2000398934016B
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags 1 1048576B 2097151B 1048576B Grub-Boot-Partition bios_grub 2 2097152B 136314879B 134217728B fat32 EFI-System-Partition boot, esp 3 136314880B 2000397885439B 2000261570560B zfs PVE-ZFS-Partition (Ok partitions copied)

Copy data from /dev/sda1 to /dev/sdb1

dd if=/dev/sda1 of=/dev/sdb1 bs=512 #This is the bios boot partition  
root@folkvang:~# dd if=/dev/sda1 of=/dev/sdb1 bs=512
2014+0 records in   
2014+0 records out  
1031168 bytes (1.0 MB) copied, 0.10164 s, 10.1 MB/s  

Replace the failed partition in the zpool

Find the ID of the failed block device

root@folkvang:~# zpool status
pool: rpool
    state: DEGRADED
    status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state.
    action: Replace the device using 'zpool replace'.
    see: http://zfsonlinux.org/msg/ZFS-8000-4J
    scan: scrub repaired 0 in 0h25m with 0 errors on Sun May  8 11:20:27 2016
    config
    NAME                    STATE     READ WRITE CKSUM
    rpool                   DEGRADED     0     0     0
      mirror-0              DEGRADED     0     0     0
        993077023721924477  FAULTED      0     0     0  was /dev/sdk2
        sdk2                ONLINE       0     0     0
    errors: No known data errors

Call zpool to replace the failed device

root@folkvang:~# zpool replace -f rpool 993077023721924477 /dev/sdl2

Make sure to wait until resilver is done before rebooting.

root@folkvang:~# zpool statuspool: rpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Sep 2 16:45:53 2016
13.2M scanned out of 8.83G at 902K/s, 2h50m to go
12.9M resilvered, 0.15% done
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
replacing-0 UNAVAIL 0 0 0
993077023721924477 FAULTED 0 0 0 was /dev/sdk2
sdl2 ONLINE 0 0 0 (resilvering)
sdk2 ONLINE 0 0 0
errors: No known data errors

After fixing the drive, we need to ensure that the boot sectors are configured.

proxmox-boot-tool format /dev/sdb2
proxmox-boot-tool init /dev/sdb2
proxmox-boot-tool refresh
proxmox-boot-tool status
proxmox-boot-tool clean
grub-install /dev/sdk
grub-install /dev/sdl
update-grub

OPTIONAL: Set Zpool to Expand

We can set a Zpool to expand itself to fill all available storage by first taking the pool offline with zpool offline <zpool>. Now, we can bring it back online with a special flag that will tell it to expand to fill its storage space. Sometimes this is enabled by default, but if it appears not to be working, you can do this to rule this setting not being set out. We do this with zpool online -e <zpool>.

If you cloned these drives directly from one drive to a larger drive, this won't work, as the partition table hasn't been reconfigured. You'll have to move the end partition and then expand the first one before running zpool online -e.