====ZFS on CentOS 8==== If you use ZFS there is a chance that updates to CentOS 8 will break the module on next boot, particularly on point releases. Be sure if updating to a new point release that support for the release is available via an updated repo from the ZFS site listed below. KEEP THIS IN MIND, otherwise you'll wonder why all your ZFS volumes disappeared after an update and you can't seem to get them back... To get them back install the latest ZFS release for the ZFS on Linux site, do a "sudo dnf remove zfs" then a "sudo dnf install zfs" and reboot... [[https://openzfs.github.io/openzfs-docs/Getting Started/RHEL and CentOS.html]] sudo dnf install epel-release wget sudo dnf install https://zfsonlinux.org/epel/zfs-release-2-2$(rpm --eval "%{dist}").noarch.rpm sudo dnf install epel-release sudo dnf install kernel-devel sudo dnf install zfs Install: sudo dnf install zfs Limit the amount of RAM it uses: sudo vim /etc/modprobe.d/zfs.conf Add: # Min 2048MB / Max 4096 MB Limit options zfs zfs_arc_min=2147483648 options zfs zfs_arc_max=4294967296 Load ZFS module sudo /sbin/modprobe zfs ====Create Zpool with Free Space==== https://wiki.archlinux.org/index.php/ZFS In this setup we are going to use the free space on each drive for ZFS. Get your storage device info sudo lsblk Create paritions sudo parted -a optimal /dev/sda print free Take the Start and End Values of your Free Space and use them to create the parition mkpart primary 72.0GB 500GB print quit Repeat for each disk that will have free space added to the zpool. Run lsblk to get a list of the partitions that will be used in the zpool. Create the zpool (here were are using spinning disks and mirroring them) sudo zpool create -f -o ashift=12 -m /mnt/data bigdata mirror /dev/sda4 /dev/sdb4 Or the equivilent of RAID10 with 4 disks sudo zpool create -f -o ashift=12 -m /mnt/data bigdata mirror /dev/sda4 /dev/sdb4 mirror /dev/sdc4 /dev/sdd4 https://www.reddit.com/r/zfs/comments/514k2r/kvm_zfs_best_practices/ Use ashift=13 on any Samsung SSD 850 era and newer. Note: when I did this parted show a 428GB partition but df -h shows 386GB for the mounted zfs volume... strange. Well lsblk shows a 398.7G parition... oh well for now, it's late. ====Create Zpool with Whole Drive==== https://www.svennd.be/create-a-zfs-mirror-pool/ In this setup we are going to use whole drives for ZFS. Get your storage device info sudo lsblk Create the zpool (here were are using spinning disks and mirroring them) sudo zpool create -f -o ashift=12 -m /mnt/data bigdata mirror /dev/sdc /dev/sdd https://www.reddit.com/r/zfs/comments/514k2r/kvm_zfs_best_practices/ Use ashift=13 on any Samsung SSD 850 era and newer. Create datasets in the ZFS pool (we use separate datasets per vm image for snapshot purposes) sudo zfs create bigdata/vm_guest_name If using datasets per VM then create the dataset first then the VM in the dataset, moving it afterward is like moving between real partitions and will take a while. Also, create RAW images via qemu-img instead of virt-gui as it defaults to falloc allocation which will take forever, use this instead: sudo qemu-img create -f raw TEST.img 50G -o preallocation=off ====ZFS Misc Info==== https://www.cyberciti.biz/faq/freebsd-linux-unix-zfs-automatic-mount-points-command/ \\ http://fibrevillage.com/storage/168-zfs-pool-zfs-datasets-and-zfs-volumes ==Mounts== ZFS file systems are automatically mounted at boot so there is no entry in fstab. To get a list of your zpools run: zpool list ==Disk Cache== If you come across a RAID controller that can be used in HBA mode but you can still control whether or not non-RAID disks have their disk caches enabled or disabled, enable them: https://serverfault.com/questions/995702/zfs-enable-or-disable-disk-cache Quote from link: The rationale is that ZFS assumes enabled disk cache and so flushes any critical writes (ie: sync write and uberblock rewrite) via appropriate and specific SATA/SAS commands (ATA FLUSH, FUAs, etc). ====ZFS Optimization==== Here data is the pool name sudo zfs set xattr=sa vg_images sudo zfs set acltype=posixacl vg_images sudo zfs set compression=lz4 vg_images sudo zfs set atime=off vg_images sudo zfs set relatime=off vg_images It seems sub-datasets inherit the properties of their parent unless you specify otherwise. ====TRIM on SSDs==== To run trim: sudo zpool trim To check trim status: sudo zpool status -t To make it automatic sudo zpool set autotrim=on ====Resilvering/Scrubbing==== https://pthree.org/2012/12/11/zfs-administration-part-vi-scrub-and-resilver/ If you have 2 or more drives in an array then you can use scrubbing to verify and repair the data if it is corrupt, following commands are to start the scrub, check the status and stop it if needed zpool scrub tank zpool status tank zpool scrub -s tank You should do this weekly if you are using cheap drives like I am: sudo vim /etc/crontab Add: 0 2 * * 0 /sbin/zpool scrub tank ====Info Commands==== Get info about zpool sudo zpool list -v Get even more info sudo zpool get all It seems you get even more info if you query the dataset of a zfs pool... I saw all my options that were set at the pool level on the dataset but not on the pool. Get device hardware info of zpools zpool status -c vendor,model,size Get info on datasets sudo zfs list Get info on space used by datasets and snapshots \\ https://www.thegeekdiary.com/how-to-find-the-space-consumed-by-zfs-snapshots/ zfs list -o space -r rpool From this output, you can see the amount of space that is: * Available on each file system * Being used * Being consumed by snapshots of each data set (USEDSNAP) * Being used by the data set itself (USEDDS) * Being used by a refreservation set on the data set (USED REFRESERV) * Being used by the children of this data set (USEDCHILD) ====ZFS Snapshot==== https://briankoopman.com/zfs-automated-snapshots/ \\ https://pthree.org/2012/12/19/zfs-administration-part-xii-snapshots-and-clones/ Finally, aside from the data integrity inherit to using ZFS it was the efficient snapshotting that I was after (well paired with send/receive). Take a snapshot of tank zfs snapshot tank/test@tuesday List ZFS snapshots zfs list -r -t snapshot tank Destroy specific snapshot from tank/test (note, you can destroy any snapshot you want as long as it doesn't have a clone, snapshots are independent of other snapshots). zfs destroy tank/test@2012:12:18:51:2:19:15 Destroy all snapshots from tank/test sudo zfs destroy -r tank/test@% Rollback a snapshot (note if you want to rollback a snapshot and there are newer snapshots than the one you are rolling back then the newer snapshot will be deleted, this is done via the -r switch) zfs rollback tank/test@tuesday And lets say you aren't sure which image you want to rollback or if you need to access multiple stages of the snapshot history you could clone and open the clone and inspect it before rolling back zfs clone tank/test@tuesday tank/tuesday My testing of snapshots for VM guest images... snapshots are all done with without notifying the the guest os so I'm guessing restoring will be like power was pulled. * rolling back a running raw windows server=it looks fine at first, the newly install program that should have disappeared even started to open but then it bluescreened, rebooted, did a chkdsk and fixed a bunch of stuff then booted into Windows. I wouldn't trust this install anymore though. Bit it was really fun, I use to enjoy writing 0s to running Win98SE machines to see when they'd crash... this brings back those days. On login the familiar "unexpected shutdown" prompt came up. No trace of the application which had been downloaded and installed after snapshot create was found. * snapshotting/rolling back a shutdown raw windows server=nothing special, no prompts and all looks good, it's just like backing up an offline image. * using a clone to create a new copy of a raw windows server=fantastic, it was fast and the clone booted fast, windows did prompt for the unexplained shutdown so it was as if the power was pulled. * snapshot of a clone works! * all of this is quick and easy and lovely, what have I been doing all these years without ZFS??? MDRAID and XFS is still very good too and there are excellent use cases for them, but boy am I'm happy to have finally tried ZFS (yes, it's not as fast but my time isn't cheap either, frequent cheap backups and less time spent administering is very valuable, buy faster storage if performance and reliability/ease of administration is important to you). ==Delete all Snapshots with a cold_backup in the Name== https://blandname.com/2012/04/09/clear-all-zfs-snapshots/ I use this to delete all cold backups (snapshots of powered off virtual guest disk images) sudo zfs list -H -o name -t snapshot | grep cold_backup | xargs -n1 sudo zfs destroy ==Create Snapshot with Specific Name for all Datasets== https://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/6mhupg6o2/index.html I use this to create snapshots of powered off virtual guest disk images sudo zfs snapshot -r vg_images@cold_backup To help clarify what rolling back a snapshot does, from Oracle: You can use the zfs rollback command to discard all changes made to a file system since a specific snapshot was created. The file system reverts to its state at the time the snapshot was taken. (so at the time it was taken the other snapshots didn't exist, however you can delete a snapshot in the middle of the list of snapshots and it's ok. Does this mean if I rollback to a later snapshot than the deleted snapshot that the deleted snapshot will reappear??? No, it doesn't appear so, at least on the backup host where I use syncoid to replicate snapshots. When deleting a snapshot it seems to back the size of prior and subsequent snapshot increase in size, earlier or later than those seem to be unaffected. ====ZFS Send/Receive==== https://www.thegeekdiary.com/zfs-tutorials-creating-zfs-snapshot-and-clones/ We are already having fun, now it's time to be practical. Snapshots are easy, fast and beautiful in their operation, now lets see if send/receive is as endearing, though it's ultimately the most important feature of the set since it's a real backup (that is to a different data volume/computer/etc). zfs send geekpool/fs1@oct2013 | ssh node02 "zfs receive testpool/testfs" ==Non-root user== I prefer to use sudo for everything just for security and consistency. ZFS send/receive was a problem because of this. See the permissions section on how to deal with this. ====ZFS Permissions/Access==== https://docs.oracle.com/cd/E88353_01/html/E72487/zfs-allow-8.html If you are going to use ZFS send/receive as a non-root user then you're going to need to setup permissions to do so in ZFS. We are going add permissions to the "wheel" group for the zpool vg_images sudo zfs allow -g wheel compression,clone,create,destroy,hold,promote,receive,rollback,send,snapshot,mount,mountpoint vg_images To show permissions sudo allow vg_images Note: mount and unmount can't be delegated under ZFS for Linux though I was able to assign mount and this seemed necessary for me to "zfs destroy" of a snapshot where to create the snapshot didn't require this permission (09-28-2020, CentOS 8). NOTE: info below on visudo doesn't appear to be necessary with the above "zfs allow" commands applied, leaving for now until firmly established. If you want to run commands on non-root accounts without typing in a sudo password then modify the sudoers file to allow specific commands to be run without a password (be as granular and restrictive as possible and evaluate how it can be abused then determine if it's the right option). sudo visudo Add the following (https://github.com/jimsalterjrs/sanoid/issues/522) # Syncoid commands pladmin ALL=(ALL) NOPASSWD:/sbin/zfs get * pladmin ALL=(ALL) NOPASSWD:/sbin/zfs snapshot * #pladmin ALL=(ALL) NOPASSWD:/sbin/zfs send * pladmin ALL=(ALL) NOPASSWD:/sbin/zfs receive * pladmin ALL=(ALL) NOPASSWD:/sbin/zfs rollback *@syncoid_cdn* pladmin ALL=(ALL) NOPASSWD:/sbin/zfs rollback *tank/video*@* # We only want to destroy snapshots pladmin ALL=(ALL) NOPASSWD:/sbin/zfs destroy *@syncoid_cdn* pladmin ALL=(ALL) NOPASSWD:/sbin/zfs destroy *tank/video*@* Where: \\ pladmin - username on the remote (slave) server. \\ tank/video - the name of the ZFS pool and partition that are being synchronized \\ cdn - the beginning of the name hostname master server. \\ Example that worked with above sudoers file sudo zfs send vg_images/SERVER2K19-TEST@cold_backup | ssh pladmin@172.18.18.234 "sudo zfs receive vg_images/SERVER2K19-TEST" Create users representative of each virtual host on backup server, use zfs allow to give permissions on each virtual servers backup dataset e.g: sudo zfs allow -u vhsrv01_backup_user clone,create,destroy,hold,promote,receive,rollback,send,snapshot vg_images/vg_backups/vhsrv01_vg_images to the responsible virtual host user, add no passwd option in sudoer file for needed zfs functions per user and limit it to each users representative dataset e.g: vhsrv01_backup_user ALL=(ALL) NOPASSWD:/sbin/zfs get *vg_images/vg_backups/vhsrv01_vg_images* vhsrv01_backup_user ALL=(ALL) NOPASSWD:/sbin/zfs snapshot *vg_images/vg_backups/vhsrv01_vg_images* vhsrv01_backup_user ALL=(ALL) NOPASSWD:/sbin/zfs receive *vg_images/vg_backups/vhsrv01_vg_images* vhsrv01_backup_user ALL=(ALL) NOPASSWD:/sbin/zfs rollback *vg_images/vg_backups/vhsrv01_vg_images* vhsrv01_backup_user ALL=(ALL) NOPASSWD:/sbin/zfs destroy *vg_images/vg_backups/vhsrv01_vg_images* Think about reversing the send/receive command so that it is the backup server that receives then sshs into the source server and initiates a send, this way the backups cant be destroyed by the source server if it's compromised. ==ZFS Snapshot + ZFS Send/Receive Example== This is for virtual guests on a KVM/libvirt host sudo virsh shutdown ROOT-CA sudo virsh list (until root-ca is shutdown) sudo zfs snapshot vhsrv03_vg_images/root-ca@cold_backup sudo zfs send vhsrv03_vg_images/root-ca@cold_backup | ssh vundermin@172.18.18.171 "sudo zfs receive vhsrv01_vg_images/root-ca" ====Recovery==== https://docs.oracle.com/cd/E23823_01/html/819-5461/gbbwc.html \\ https://github.com/xuanngo2001/cust-live-deb/issues/298 \\ https://github.com/openzfs/zfs/issues/10667 There are no "undelete" or scanning for deleted files utilities that I've tried, though some are listed and since ZFS is CoW maybe they work... I had an issue with a power outage that lasted long enough to drain the 10 year old battery on a RAID controller (please laugh for reasons you might know). CentOS didn't boot and both /boot and / filesystems had to be repaired. ZFS wouldn't come up at all, the OS would stop booting at the point "a start job is running for import zfs pools by cache file". Booting to recovery then moving/deleting /etc/zfs/zpool.cache allowed the system to boot, but no zpool was listed. When doing zpool import it would fail. The following looked like it might work but caused a kernel panic after a while (import pool without cache file): zpool import -FfmX pool5 The following allowed me to access it in readonly mode and actually recover data! zpool import -o readonly=on -f pool0