Quantcast
Channel: Active questions tagged zfs - Server Fault
Viewing all articles
Browse latest Browse all 142

ZFS scrubbing keeps reparing disks

$
0
0

I've got two SSDs in a RAID 1 (mirroring) configuration on ZFS. They are quite old (~10 years I guess) but have never been used much in these years. That's my configuration

  pool: tank state: ONLINEstatus: One or more devices has experienced an unrecoverable error.  An    attempt was made to correct the error.  Applications are unaffected.action: Determine if the device needs to be replaced, and clear the errors    using 'zpool clear' or replace the device with 'zpool replace'.   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P  scan: scrub in progress since Wed Nov 22 15:56:15 2023    176G scanned at 454M/s, 28.3G issued at 73.2M/s, 176G total    4.50K repaired, 16.11% done, 00:34:21 to goconfig:    NAME        STATE     READ WRITE CKSUM    tank        ONLINE       0     0     0      mirror-0  ONLINE       0     0     0        sda     ONLINE       0     0     3  (repairing)        sdb     ONLINE       0     0     6  (repairing)

As you can see, while scrubbing it has found some checksum inconsistencies and has been able to repair them. The weird thing is that even if I write nothing new on the disks and run two scrubbing operations, one after the other has completed, it always finds new errors on both disks.

Looking at the output of dmesg, I see no problem related to the disks (no terrible-looking red errors). The only thing I spotted is this

[18125.949842] RIP: 0033:0x7f5eb2eeee83[18125.949849] RSP: 002b:00007f5eb21fc6f8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9[18125.949859] RAX: ffffffffffffffda RBX: 00007f5ea40178d0 RCX: 00007f5eb2eeee83[18125.949865] RDX: 0000000000008000 RSI: 00007f5ea40178d0 RDI: 0000000000000009[18125.949870] RBP: 00007f5ea40178a4 R08: 0000000000000007 R09: 00007f5ea4007650[18125.949876] R10: 3ade3c6b4360070e R11: 0000000000000293 R12: ffffffffffffff50[18125.949882] R13: 0000000000000000 R14: 00007f5ea40178a0 R15: 00007f5eb21fcbf0[18125.949894]  </TASK>[18125.949898] INFO: task fish:591217 blocked for more than 120 seconds.[18125.949906]       Tainted: P           OE      6.1.0-13-amd64 #1 Debian 6.1.55-1[18125.949914] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.[18125.949919] task:fish            state:D stack:0     pid:591217 ppid:584012 flags:0x00000002[18125.949930] Call Trace:[18125.949933]  <TASK>[18125.949939]  __schedule+0x351/0xa20[18125.949954]  schedule+0x5d/0xe0[18125.949961]  io_schedule+0x42/0x70[18125.949969]  cv_wait_common+0xaa/0x130 [spl][18125.950003]  ? cpuusage_read+0x10/0x10[18125.950014]  txg_wait_synced_impl+0xcb/0x110 [zfs][18125.950417]  txg_wait_synced+0xc/0x40 [zfs][18125.950812]  dmu_tx_wait+0x208/0x430 [zfs][18125.951127]  dmu_tx_assign+0x15e/0x510 [zfs][18125.951442]  zfs_dirty_inode+0x14d/0x360 [zfs][18125.951863]  zpl_dirty_inode+0x25/0x40 [zfs][18125.952277]  __mark_inode_dirty+0x53/0x380[18125.952289]  touch_atime+0x1d1/0x1f0[18125.952299]  iterate_dir+0xff/0x1c0[18125.952309]  __x64_sys_getdents64+0x84/0x120[18125.952318]  ? compat_filldir+0x190/0x190[18125.952330]  do_syscall_64+0x58/0xc0[18125.952342]  ? fpregs_assert_state_consistent+0x22/0x50[18125.952352]  ? exit_to_user_mode_prepare+0x40/0x1d0[18125.952362]  ? syscall_exit_to_user_mode+0x27/0x40[18125.952370]  ? do_syscall_64+0x67/0xc0[18125.952380]  ? do_syscall_64+0x67/0xc0[18125.952391]  entry_SYSCALL_64_after_hwframe+0x64/0xce

that does look to me as the dump of something zfs related (very vague) even though the task listed as blocked is fish (my shell). Might that be my problem (I don't think so) or does this simply mean that my disks are faulty and are about to die?

I'm on a Debian 12 linux box if that helps.

Thanks in advance for any help ;-)


Viewing all articles
Browse latest Browse all 142

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>