ZFS Failed Disk Replacement: Difference between revisions

From RoseWiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==Copy partitions from good disk sda to blank disk sdb==
==Copy partitions from good disk sda to blank disk sdb==
<b>sgdisk -R /dev/sdb /dev/sda</b> # sgdisk -R /dev/sdb<Replicate to this disk>  /dev/sda<From this disk><br>
sgdisk /dev/sda -R /dev/sdb # sgdisk  /dev/sda<From this disk> -R /dev/sdb<Replicate to this disk><br>sgdisk -G /dev/sdb # randomize the GUID on the new disk since it was copied from the other drive.
<b>sgdisk -G /dev/sdb</b> # randomize the GUID on the new disk since it was copied from the other drive.<br>


==Using Parted to verify the partition table of /dev/sdl==
==Using Parted to verify the partition table of /dev/sdb==
<b>(parted) select /dev/sdl</b><br>
(parted) select /dev/sdb<br>Using /dev/sdb
Using /dev/sdl<br>
<br>(parted) p<br>   Model: ATA WDC WD2000FYYZ-0 (scsi)<br>   Disk /dev/sdb: 2000398934016B<br>   Sector size (logical/physical): 512B/512B<br>   Partition Table: gpt<br>   Disk Flags:<br>   Number Start End Size File system Name Flags
<b>(parted) p</b><br>
    1 1048576B 2097151B 1048576B Grub-Boot-Partition bios_grub
:Model: ATA WDC WD2000FYYZ-0 (scsi)<br>
    2 2097152B 136314879B 134217728B fat32 EFI-System-Partition boot, esp
:Disk /dev/sdl: 2000398934016B<br>
    3 136314880B 2000397885439B 2000261570560B zfs PVE-ZFS-Partition
:Sector size (logical/physical): 512B/512B<br>
:Partition Table: gpt<br>
(Ok partitions copied)
:Disk Flags:<br>
<br>
:Number Start End Size File system Name Flags<br>
:1 1048576B 2097151B 1048576B Grub-Boot-Partition bios_grub<br>
:2 2097152B 136314879B 134217728B fat32 EFI-System-Partition boot, esp<br>
:3 136314880B 2000397885439B 2000261570560B zfs PVE-ZFS-Partition<br>


(Ok partitions copied)
==Copy data from /dev/sda1 to /dev/sdb1==
 
dd if=/dev/sda1 of=/dev/sdb1 bs=512 #This is the bios boot partition   
==Copy data from /dev/sda1 to /dev/sdb1 and /dev/sda2 to /dev/sdb2==
root@folkvang:~# dd if=/dev/sda1 of=/dev/sdb1 bs=512
<b>dd if=/dev/sda1 of=/dev/sdb1 bs=512</b> #This is the bios boot partition  <br>
  2014+0 records in   
root@folkvang:~# <b>dd if=/dev/sdk1 of=/dev/sdl1 bs=512</b><br>  
2014+0 records out   
2014+0 records in  <br>
1031168 bytes (1.0 MB) copied, 0.10164 s, 10.1 MB/s   
2014+0 records out  <br>
1031168 bytes (1.0 MB) copied, 0.10164 s, 10.1 MB/s  <br>


==Replace the failed partition in the zpool==
==Replace the failed partition in the zpool==
Find the ID of the failed block device
Find the ID of the failed block device
root@folkvang:~# zpool status
pool: rpool
    state: DEGRADED
    status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state.
    action: Replace the device using 'zpool replace'.
    see: http://zfsonlinux.org/msg/ZFS-8000-4J
    scan: scrub repaired 0 in 0h25m with 0 errors on Sun May  8 11:20:27 2016
    config
    NAME                    STATE    READ WRITE CKSUM
    rpool                  DEGRADED    0    0    0
      mirror-0              DEGRADED    0    0    0
        993077023721924477  FAULTED      0    0    0  was /dev/sdk2
        sdk2                ONLINE      0    0    0
    errors: No known data errors


: root@folkvang:~# <b>zpool status</b>
::   pool: rpool
::   state: DEGRADED
::    status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state.
::      action: Replace the device using 'zpool replace'.
::   see: http://zfsonlinux.org/msg/ZFS-8000-4J
::   scan: scrub repaired 0 in 0h25m with 0 errors on Sun May  8 11:20:27 2016
::   config:
:: NAME                    STATE    READ WRITE CKSUM
:: rpool                  DEGRADED    0    0    0
::   mirror-0              DEGRADED    0    0    0
::     993077023721924477  FAULTED      0    0    0  was /dev/sdk2
::     sdk2                ONLINE      0    0    0
::   errors: No known data errors
==Call zpool to replace the failed device==
==Call zpool to replace the failed device==
root@folkvang:~# zpool replace -f rpool 993077023721924477 /dev/sdl2
Make sure to wait until resilver is done before rebooting.
root@folkvang:~# zpool statuspool: rpool<br>    state: DEGRADED<br>    status: One or more devices is currently being resilvered.  The pool will continue to function, possibly in a degraded state.<br>    action: Wait for the resilver to complete.<br>    scan: resilver in progress since Fri Sep  2 16:45:53 2016<br>    13.2M scanned out of 8.83G at 902K/s, 2h50m to go<br>    12.9M resilvered, 0.15% done<br>    config:<br>    NAME                      STATE    READ WRITE CKSUM<br>    rpool                    DEGRADED    0    0    0<br>      mirror-0                DEGRADED    0    0    0<br>        replacing-0          UNAVAIL      0    0    0<br>          993077023721924477  FAULTED      0    0    0  was /dev/sdk2<br>          sdl2                ONLINE      0    0    0  (resilvering)<br>        sdk2                  ONLINE      0    0    0<br>    errors: No known data errors
== After fixing the drive, we need to ensure that the boot sectors are configured. ==
proxmox-boot-tool format /dev/sdb2
proxmox-boot-tool init /dev/sdb2
proxmox-boot-tool refresh
proxmox-boot-tool status
proxmox-boot-tool clean
<b>grub-install /dev/sdk</b>
<b>grub-install /dev/sdl</b>
<b>update-grub</b>


: root@folkvang:~# <b>zpool replace -f rpool 993077023721924477 /dev/sdl2</b><br>
== OPTIONAL: Set Zpool to Expand ==
: <b>Make sure to wait until resilver is done before rebooting.</b><br>
We can set a Zpool to expand itself to fill all available storage by first taking the pool offline with '''zpool offline <zpool>.''' Now, we can bring it back online with a special flag that will tell it to expand to fill its storage space. Sometimes this is enabled by default, but if it appears not to be working, you can do this to rule this setting not being set out. We do this with '''zpool online -e <zpool>'''.
: root@folkvang:~# <b>zpool status</b><br>
::   pool: rpool<br>
:::   state: DEGRADED<br>
:::   status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.<br>
:::   action: Wait for the resilver to complete.<br>
:::   scan: resilver in progress since Fri Sep  2 16:45:53 2016<br>
:::     13.2M scanned out of 8.83G at 902K/s, 2h50m to go<br>
:::     12.9M resilvered, 0.15% done<br>
:::   config:<br>
:: NAME                      STATE    READ WRITE CKSUM<br>
:: rpool                    DEGRADED    0    0    0<br>
:::   mirror-0                DEGRADED    0    0    0<br>
:::     replacing-0          UNAVAIL      0    0    0<br>
:::       993077023721924477  FAULTED      0    0    0  was /dev/sdk2<br>
:::       sdl2                ONLINE      0    0    0  (resilvering)<br>
:::     sdk2                  ONLINE      0    0    0<br>
:::
:::   errors: No known data errors


If you cloned these drives directly from one drive to a larger drive, this won't work, as the partition table hasn't been reconfigured. You'll have to move the end partition and then expand the first one before running '''zpool''' '''online -e'''.


(Just in case I did)<br>
<b>grub-install /dev/sdk<br>
grub-install /dev/sdl<br>
update-grub</b>
[[Category:Linux Tutorials]]
[[Category:Linux Tutorials]]
[[Category:Proxmox Tutorials]]
[[Category:Proxmox Tutorials]]

Latest revision as of 21:27, 12 July 2025

Copy partitions from good disk sda to blank disk sdb

sgdisk /dev/sda -R /dev/sdb	# sgdisk  /dev/sda<From this disk> -R /dev/sdb<Replicate to this disk>
sgdisk -G /dev/sdb # randomize the GUID on the new disk since it was copied from the other drive.

Using Parted to verify the partition table of /dev/sdb

(parted) select /dev/sdb
Using /dev/sdb
(parted) p
Model: ATA WDC WD2000FYYZ-0 (scsi)
Disk /dev/sdb: 2000398934016B
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags 1 1048576B 2097151B 1048576B Grub-Boot-Partition bios_grub 2 2097152B 136314879B 134217728B fat32 EFI-System-Partition boot, esp 3 136314880B 2000397885439B 2000261570560B zfs PVE-ZFS-Partition (Ok partitions copied)

Copy data from /dev/sda1 to /dev/sdb1

dd if=/dev/sda1 of=/dev/sdb1 bs=512 #This is the bios boot partition  
root@folkvang:~# dd if=/dev/sda1 of=/dev/sdb1 bs=512
2014+0 records in   
2014+0 records out  
1031168 bytes (1.0 MB) copied, 0.10164 s, 10.1 MB/s  

Replace the failed partition in the zpool

Find the ID of the failed block device

root@folkvang:~# zpool status
pool: rpool
    state: DEGRADED
    status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state.
    action: Replace the device using 'zpool replace'.
    see: http://zfsonlinux.org/msg/ZFS-8000-4J
    scan: scrub repaired 0 in 0h25m with 0 errors on Sun May  8 11:20:27 2016
    config
    NAME                    STATE     READ WRITE CKSUM
    rpool                   DEGRADED     0     0     0
      mirror-0              DEGRADED     0     0     0
        993077023721924477  FAULTED      0     0     0  was /dev/sdk2
        sdk2                ONLINE       0     0     0
    errors: No known data errors

Call zpool to replace the failed device

root@folkvang:~# zpool replace -f rpool 993077023721924477 /dev/sdl2

Make sure to wait until resilver is done before rebooting.

root@folkvang:~# zpool statuspool: rpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Sep 2 16:45:53 2016
13.2M scanned out of 8.83G at 902K/s, 2h50m to go
12.9M resilvered, 0.15% done
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
replacing-0 UNAVAIL 0 0 0
993077023721924477 FAULTED 0 0 0 was /dev/sdk2
sdl2 ONLINE 0 0 0 (resilvering)
sdk2 ONLINE 0 0 0
errors: No known data errors

After fixing the drive, we need to ensure that the boot sectors are configured.

proxmox-boot-tool format /dev/sdb2
proxmox-boot-tool init /dev/sdb2
proxmox-boot-tool refresh
proxmox-boot-tool status
proxmox-boot-tool clean
grub-install /dev/sdk
grub-install /dev/sdl
update-grub

OPTIONAL: Set Zpool to Expand

We can set a Zpool to expand itself to fill all available storage by first taking the pool offline with zpool offline <zpool>. Now, we can bring it back online with a special flag that will tell it to expand to fill its storage space. Sometimes this is enabled by default, but if it appears not to be working, you can do this to rule this setting not being set out. We do this with zpool online -e <zpool>.

If you cloned these drives directly from one drive to a larger drive, this won't work, as the partition table hasn't been reconfigured. You'll have to move the end partition and then expand the first one before running zpool online -e.