eldakka
Reply Icon
Re: Justice for bcachefs!
Not properly, it doesn't re-stripe the existing data like mdadm or btrfs, it just evens out the disk usage.
A 3 disk raid5 expanded to 5 will inherit the same 50% parity overhead for existing data,
And that can be solved by a simple mv and copy back the file. e.g.
mv $i $i.tmp && cp -p $i.tmp $i && rm $i.tmp
Stick that (or your own preference, using rsync for example) in a simple script/find command to recurse it (with appropriate checks/tests etc.), and that'll make the 'old' data stripe 'properly' across the full RAID width.
eldakka
Reply Icon
Re: Justice for bcachefs!
is as much a "simple solution" and so divorced from the behaviour we'd get if ZFS did the re-striping itself* that you may as well say we don't need ZFS to do snapshots for us, we could write our own simple script to, ooh, create a new overlay/passthrough file system, change all the mount points, halt all processes with writable file handles open... (yes, yes, I'm being hyperbolic).
I never said it shouldn't be something ZFS does transparently. I never said it would be a bad idea or unnecessary thing for ZFS to support.
I was merely pointing out that it is a fairly simple thing to work around such that maybe the unpaid ZFS devs feel they have more important things to work on for now. I mean, it's taken the best part of 20 years to even get the ability to expand a RAIDZ vdev at all.
I'll also say that if anyone actually cares about the filesystem they are using, making conscious decisions to choose a filesystem like ZFS or whatever, then they are not a typical average user. Typical average users don't create ZFS arrays of multiple disks in various raidz/mirror volumes and then grow them. That is not the use-case of an average user.
Later (below) you say "production-ready", why are you messing around with growing raidz vdevs and wanting to re-stripe them to distribute across the array? That is a hobbyist/homelab-type situation. If you are using ZFS in a production environment - that is revenue/income is tied to it - then the answer is to create a new raidz and migrate (zfs-send/receive) data to it. No messing about with growing raidz vdevs and re-striping the data, that's just totally unnecessary.
e.g. 'beneath' the user file access level with no possibility of access control issues,
If you run the mv and cp as root, then there will be no access control issues, cp -p (as root) will preserve file permissions and FACLs.
not risking problems when changing your simplistic commands into production-ready "appropriate check/tests etc" like status reports, running automatically, maybe even backing off when there is a momentary load increase so the whole server isn't bogged down as the recursive cp
If you system gets bogged down from doing a single file copy, then I think you have a system problem.
chews the terabytes,
Why would it chew terabytes? Unless you have TB-sized files, it won't. Recursive doesn't mean what I think you think it means. It does not mean "in parallel". The example I gave will work on a single file at a time in a serial process, and will not move onto the next file until the current file is complete (tehniically it won't move on at all, it's the inner part of a loop you'd need to feed a file list to it). Therefore no extra space beyond the size of the currently being worked on file is needed.
not risking losing track when your telnet into the server shell dies
Why would that do anything? At worst you'll have a single $i.tmp file that you might have to manually do the cp back to the original ($i) name. There will be no data loss (and especially not if you snapshot it first). And even if you 'lose track', just start again, no biggie, will just take longer as you're redoing some of the already done work.
And as I said, you can use things like rsync instead, which would give you the ability to 'keep track' instead. The command I pasted was just the simplest one to give an idea of what is needed, just making a new copy of the file will re-stripe it across the full raidz. Or if you have your pool split up into many smaller filesystems rather than just a single one for the entire pool, then you can zfs-send/receive the filesystem to a enw filesystem in the same pool then use "zfs set mountpoint=<oldmountpoint>) to give the new filesystem the same mountpoint as the old one, then delete the old one.
(not risking a brainfart and doing all that copying over the LAN and back again!) - and simply being accessible to Joe Bloggs ZFS user who just would like it all to work, please.
I agree, it would be. But it doesn't. I'm pointing out that there is a solution to the issue the poster I am replying to mentioned. It is annoying to have to do (I've done it when I changed the recordsize of my filesystems), but it can be done, and it's not particularly difficult.
If someone is going to choose something like ZFS, I'd expect them to be able to do internet searches on topics like this and get help from technical forums or various guides that people have written to cover this sort of use-case. There are guides and instructions on how to do this sort of thing.