I am redesigning my homelab servers from scratch, and I want to give ZFS a try after some experimentation I've done in a VM.
I know there is a limitation - disks in vdevs must all have the same size (otherwise only the smallest one would be used). That's a problem for me - I have an assorted bunch of various sized HDDs, and unfortunately I have no budget for upgrades now. So my goal is to squeeze the maximum out of what I already have without compromising ZFS.
Disclaimer: I know ZFS is an enterprise file system, and that in enterprise world it's cheaper to buy a bag of identical disks rather than pay engineers to do what I am going to do.
So, to do this anyway, I have came up with following workaround idea, which I want to validate.
My initial setup:
- 4 disks of 16 TB each (sd[abcd])- 3 disks of 8 TB each (sd[efg])- 2 disks of 6 TB each (sd[hi])
I carve a partition on every disk as large as the smallest non-zero free space - of course, while accounting for partition alignment etc. - and repeat this until no more disks with free space left.
Final picture:
- 4 disks of 16 TB each (sd[abcd]): part1: 6 TB part2: 2 TB part3: 8 TB- 3 disks of 8 TB each (sd[efg]): part1: 6 TB part2: 2 TB- 2 disks of 6 TB each (sd[hi]): part1: 6 TB+-------------------------+------------+----------------------------------+sda: | sda1: 6 TB | sda2: 2 TB | sda3: 8 TB |+-------------------------+------------+----------------------------------+sdb: | sdb1: 6 TB | sdb2: 2 TB | sdb3: 8 TB |+-------------------------+------------+----------------------------------+sdc: | sdc1: 6 TB | sdc2: 2 TB | sdc3: 8 TB |+-------------------------+------------+----------------------------------+sdd: | sdd1: 6 TB | sdd2: 2 TB | sdd3: 8 TB |+-------------------------+------------+----------------------------------+sde: | sde1: 6 TB | sde2: 2 TB |+-------------------------+------------+sdf: | sdf1: 6 TB | sdf2: 2 TB |+-------------------------+------------+sdg: | sdg1: 6 TB | sdg2: 2 TB |+-------------------------+------------+sdh: | sdh1: 6 TB |+-------------------------+sdi: | sdi1: 6 TB |+-------------------------+
Now I have created equally sized partitions on physically different disks that I can use as building blocks for multiple RAIDZ:
zpool create tank # |----- 16 TB -----| |--- 8 TB ---| |- 6TB -| raidz2 sda1 sdb1 sdc1 sdd1 sde1 sdf1 sdg1 sdh1 sdi1 # part1: 6 TB raidz2 sda2 sdb2 sdc2 sdd2 sde2 sdf2 sdg2 # .......... part2: 2 TB raidz2 sda3 sdb3 sdc3 sdd3 # .......................... part3: 8 TB
I think the following should be true:
- every RAIDZ vdev uses only partitions that are on physically different drives - so if one disk fails, at most one device in RAIDZ will fail
- when one drive fails (say
sda
), it will cause all three RAIDZ to degrade, but replacing the disk and repartitioning it the same way will let ZFS transparently recover - since ZFS is said to prefer writes to the vdev with most free space, and first RAIDZ will have the most of it, then until we run out of first 6 TB I suppose other RAIDZ wouldn't see much use, so there shouldn't be IOPS bottlenecks. I hope.
The final touch is to set IO scheduler to "noop", though I am not sure if ZFS would be intelligent enough to properly schedule across and realize sda1 and sda2 are on the same spinning rust device.
In theory, I don't see why this setup wouldn't work, but I suppose I might be missing something else. Are there any downsides or dangers in running this configuration?