I am redesigning my homelab servers from scratch, and want to try ZFS after experimentation I've done.
I know there is a limitation that disks in vdevs must all have the same size, otherwise only the smallest one would be used.
That's a problem for me. I have an assortment of various sized HDDs, and no budget for upgrades. My goal is to use what I have without compromising ZFS.
I know ZFS is an enterprise file system, and that in the enterprise world it's cheaper to buy identical disks rather do what I am going to do.
I have come up with the following workaround idea, which I want to validate.
Initial setup:
- 4 disks of 16 TB each (sd[abcd])- 3 disks of 8 TB each (sd[efg])- 2 disks of 6 TB each (sd[hi])
I create a partition on every disk as large as the smallest non-zero free space, while accounting for partition alignment, and repeat this until no more disks with free space left.
Final picture:
- 4 disks of 16 TB each (sd[abcd]): part1: 6 TB part2: 2 TB part3: 8 TB- 3 disks of 8 TB each (sd[efg]): part1: 6 TB part2: 2 TB- 2 disks of 6 TB each (sd[hi]): part1: 6 TB+-------------------------+------------+----------------------------------+sda: | sda1: 6 TB | sda2: 2 TB | sda3: 8 TB |+-------------------------+------------+----------------------------------+sdb: | sdb1: 6 TB | sdb2: 2 TB | sdb3: 8 TB |+-------------------------+------------+----------------------------------+sdc: | sdc1: 6 TB | sdc2: 2 TB | sdc3: 8 TB |+-------------------------+------------+----------------------------------+sdd: | sdd1: 6 TB | sdd2: 2 TB | sdd3: 8 TB |+-------------------------+------------+----------------------------------+sde: | sde1: 6 TB | sde2: 2 TB |+-------------------------+------------+sdf: | sdf1: 6 TB | sdf2: 2 TB |+-------------------------+------------+sdg: | sdg1: 6 TB | sdg2: 2 TB |+-------------------------+------------+sdh: | sdh1: 6 TB |+-------------------------+sdi: | sdi1: 6 TB |+-------------------------+
I created equally sized partitions on different physical disks that I can use as building blocks for multiple RAIDZ:
zpool create tank # |----- 16 TB -----| |--- 8 TB ---| |- 6TB -| raidz2 sda1 sdb1 sdc1 sdd1 sde1 sdf1 sdg1 sdh1 sdi1 # part1: 6 TB raidz2 sda2 sdb2 sdc2 sdd2 sde2 sdf2 sdg2 # .......... part2: 2 TB raidz2 sda3 sdb3 sdc3 sdd3 # .......................... part3: 8 TB
I think the following should be true:
- every RAIDZ vdev uses only partitions that are on different physical drives, so that if one disk fails, at most one device in RAIDZ will fail
- when one drive fails, it will cause all three RAIDZ to degrade, but replacing the disk and repartitioning it the same way will let ZFS transparently recover
- since ZFS prefers writes to the vdev with the most free space, and first RAIDZ will have the most, then until we run out of first 6 TB I suppose other RAIDZ wouldn't see much use, so there shouldn't be IOPS bottlenecks. I hope so.
The final touch is to set the IO scheduler to "noop", though I am not sure if ZFS would be intelligent enough to properly schedule across and realize sda1 and sda2 are on the same spinning rust device.
In theory, I don't see why this setup wouldn't work, but I may be missing something. Are there any downsides or dangers in running this configuration?