Quantcast
Channel: Active questions tagged zfs - Server Fault
Viewing all articles
Browse latest Browse all 146

Access to ZFS pool when one drive has failed is incredibly slow

$
0
0

I have a ZFS box that I inherited that is having a lot of issues. Checking the status I see there are a few drives with issues:

ganymede $ zpool status -x  pool: dpool state: DEGRADEDstatus: One or more devices is currently being resilvered.  The pool will    continue to function, possibly in a degraded state.action: Wait for the resilver to complete.  scan: resilver in progress since Thu Feb 15 00:51:49 2024    88.1M scanned out of 36.2T at 6.77M/s, (scan is slow, no estimated time)    25.3M resilvered, 0.00% doneconfig:    NAME                                      STATE     READ WRITE CKSUM    dpool                                     DEGRADED     0     0     0      mirror-0                                DEGRADED     0     0     0        12151399272057691850                  UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST8000NM0055-1RM112_ZA11E6HJ-part1        ata-ST8000NM0055-1RM112_ZA158JRW      ONLINE       0     0     0      mirror-1                                DEGRADED     0     0     0        ata-ST8000NM0055-1RM112_ZA15FG7E      ONLINE       0     0     0  (resilvering)        ata-ST8000NM0055-1RM112_ZA15FGCM      DEGRADED    22     0    12  too many errors      mirror-2                                ONLINE       0     0     0        ata-ST8000NM0055-1RM112_ZA164M9J      ONLINE       0     0     0  (resilvering)        ata-ST8000NM0055-1RM112_ZA164QKP      ONLINE       0     0     0      mirror-3                                ONLINE       0     0     0        ata-TOSHIBA_MC04ACA600A_X5J1K05JFE6C  ONLINE       0     0     0        ata-TOSHIBA_MC04ACA600A_X5J9K004FE6C  ONLINE       0     0     0      mirror-4                                ONLINE       0     0     0        ata-TOSHIBA_MC04ACA600A_X5J9K005FE6C  ONLINE       0     0     0        ata-TOSHIBA_MC04ACA600A_X5LEK019FE6C  ONLINE       0     0     0      mirror-5                                ONLINE       0     0     0        ata-TOSHIBA_MC04ACA600A_X5J9K007FE6C  ONLINE       0     0     0        ata-TOSHIBA_MC04ACA600A_X5JFK001FE6C  ONLINE       0     0     0errors: No known data errors

I am trying to pull the data off this system (back it up to s3) before replacing the disks. However, the drive in mirror-1 (ata-ST8000NM0055-1RM112_ZA15FGCM) is having issues and I believe it is slowing down all ops with the data (if I let resilvering go it dwindles down to K/s, a week later it is still running).

Looking at dmesg output I see a ton of these errors:

[  464.866611] mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)[  464.866635] sd 1:0:27:0: [sdaa] tag#0 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK[  464.866637] sd 1:0:27:0: [sdaa] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE[  464.866653] sd 1:0:27:0: [sdaa] tag#2 Sense Key : Medium Error [current] [descriptor][  464.866658] sd 1:0:27:0: [sdaa] tag#0 CDB: Read(16) 88 00 00 00 00 02 78 25 d7 38 00 00 00 08 00 00[  464.866666] sd 1:0:27:0: [sdaa] tag#2 Add. Sense: Unrecovered read error[  464.866670] print_req_error: I/O error, dev sdaa, sector 10605680440[  464.866677] sd 1:0:27:0: [sdaa] tag#2 CDB: Read(16) 88 00 00 00 00 02 78 25 d5 68 00 00 00 f0 00 00[  464.866767] print_req_error: critical medium error, dev sdaa, sector 10605680096

Considering that at least every mirror in the pool has a good drive, is there a way that I can simply remove the disk that is causing issues (I don't have physical access) so that I can get the data off the server?

I tried disabling the disk

syncecho 1 > /sys/block/sdaa/device/delete

But access to the data on ZFS continued to be extremely slow (i.e., 10 minutes to copy a 93mb file to AWS s3 using awscli).

Just trying to figure out the best path forward when a system is in this state.


Viewing all articles
Browse latest Browse all 146

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>