I am still a newbie with ZFS. We had a faulty disk in a ZFS pool (using Ubuntu 22.04.2. LTS). It had been replaced automatically by a spare disk (I try to quote only the relevant parts):
zpool status ZFSpool: ZFSstate: DEGRADED... raidz1-2 DEGRADED 0 0 0 wwn-0x5000039c48513d8d ONLINE 0 0 0 spare-1 DEGRADED 0 0 0 wwn-0x5000039c48512f59 UNAVAIL 4 128 0 wwn-0x5000039c48508ee1 ONLINE 0 0 0... spares wwn-0x5000039c48508ee1 INUSE currently in useerrors: No known data errors
I detached the faulty spare, after which the physical disk was replaced:
zpool detach ZFS wwn-0x5000039c48512f59
Now it seems ok:
zpool status ZFS pool: ZFS state: ONLINE scan: resilvered 826G in 01:40:08 with 0 errors on Thu May 2 00:18:18 2024config: NAME STATE READ WRITE CKSUM ZFS ONLINE 0 0 0... raidz1-2 ONLINE 0 0 0 wwn-0x5000039c48513d8d ONLINE 0 0 0 wwn-0x5000039c48508ee1 ONLINE 0 0 0 wwn-0x5000039c48513d35 ONLINE 0 0 0 wwn-0x5000039c48510189 ONLINE 0 0 0 wwn-0x5000039c48516f0d ONLINE 0 0 0 logs wwn-0x5002538022c12e30 ONLINE 0 0 0 cache wwn-0x5002538022c12d70 ONLINE 0 0 0errors: No known data errors
The remaining task is to add the new disk as a spare, but how do I identify which is the right disk to add with:
zpool add -f ZFS spare (disk)
lsblkNAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS...sdj 8:144 0 7.3T 0 disk├─sdj1 8:145 0 7.3T 0 part└─sdj9 8:153 0 8M 0 partsdk 8:160 0 7.3T 0 disk├─sdk1 8:161 0 7.3T 0 part└─sdk9 8:169 0 8M 0 partsdl 8:176 0 7.3T 0 disksdm 8:192 0 7.3T 0 disk├─sdm1 8:193 0 7.3T 0 part└─sdm9 8:201 0 8M 0 part...ls -al /dev/disk/by-id/|grep sdllrwxrwxrwx 1 root root 9 May 13 12:46 scsi-35000c500f8025907 -> ../../sdllrwxrwxrwx 1 root root 9 May 13 12:46 scsi-SSEAGATE_ST8000NM024B_WWZ51SS9 -> ../../sdllrwxrwxrwx 1 root root 9 May 13 12:46 wwn-0x5000c500f8025907 -> ../../sdl
Am I correct in reasoning that because sdl is the only disk that isn't partitioned or connected to any LVM etc., it must be the new replacement disk, which I should add as a spare:
zpool add -f ZFS spare wwn-0x5000c500f8025907
Or should I try to double-check this somehow, like checking that sdl really does not contain any data or is in use (how?), can it been seen in syslog or blkid that this disk was recently added, etc.?
Naturally I should have checked those things, like to which disk the unavailable wwn was pointing to etc. before the disk swap, but I keep learning these things as I do...