ZFS on CentOS 8
If you use ZFS there is a chance that updates to CentOS 8 will break the module on next boot, particularly on point releases. Be sure if updating to a new point release that support for the release is available via an updated repo from the ZFS site listed below. KEEP THIS IN MIND, otherwise you'll wonder why all your ZFS volumes disappeared after an update and you can't seem to get them back… To get them back install the latest ZFS release for the ZFS on Linux site, do a “sudo dnf remove zfs” then a “sudo dnf install zfs” and reboot…
https://openzfs.github.io/openzfs-docs/Getting Started/RHEL and CentOS.html
sudo dnf install epel-release wget sudo dnf install https://zfsonlinux.org/epel/zfs-release-2-2$(rpm --eval "%{dist}").noarch.rpm sudo dnf install epel-release sudo dnf install kernel-devel sudo dnf install zfs
Install:
sudo dnf install zfs
Limit the amount of RAM it uses:
sudo vim /etc/modprobe.d/zfs.conf
Add:
# Min 2048MB / Max 4096 MB Limit options zfs zfs_arc_min=2147483648 options zfs zfs_arc_max=4294967296
Load ZFS module
sudo /sbin/modprobe zfs
Create Zpool with Free Space
https://wiki.archlinux.org/index.php/ZFS
In this setup we are going to use the free space on each drive for ZFS.
Get your storage device info
sudo lsblk
Create paritions
sudo parted -a optimal /dev/sda print free
Take the Start and End Values of your Free Space and use them to create the parition
mkpart primary 72.0GB 500GB print quit
Repeat for each disk that will have free space added to the zpool. Run
lsblk
to get a list of the partitions that will be used in the zpool.
Create the zpool (here were are using spinning disks and mirroring them)
sudo zpool create -f -o ashift=12 -m /mnt/data bigdata mirror /dev/sda4 /dev/sdb4
Or the equivilent of RAID10 with 4 disks
sudo zpool create -f -o ashift=12 -m /mnt/data bigdata mirror /dev/sda4 /dev/sdb4 mirror /dev/sdc4 /dev/sdd4
https://www.reddit.com/r/zfs/comments/514k2r/kvm_zfs_best_practices/ Use ashift=13 on any Samsung SSD 850 era and newer.
Note: when I did this parted show a 428GB partition but df -h shows 386GB for the mounted zfs volume… strange. Well lsblk shows a 398.7G parition… oh well for now, it's late.
Create Zpool with Whole Drive
https://www.svennd.be/create-a-zfs-mirror-pool/
In this setup we are going to use whole drives for ZFS.
Get your storage device info
sudo lsblk
Create the zpool (here were are using spinning disks and mirroring them)
sudo zpool create -f -o ashift=12 -m /mnt/data bigdata mirror /dev/sdc /dev/sdd
https://www.reddit.com/r/zfs/comments/514k2r/kvm_zfs_best_practices/ Use ashift=13 on any Samsung SSD 850 era and newer.
Create datasets in the ZFS pool (we use separate datasets per vm image for snapshot purposes)
sudo zfs create bigdata/vm_guest_name
If using datasets per VM then create the dataset first then the VM in the dataset, moving it afterward is like moving between real partitions and will take a while. Also, create RAW images via qemu-img instead of virt-gui as it defaults to falloc allocation which will take forever, use this instead:
sudo qemu-img create -f raw TEST.img 50G -o preallocation=off
ZFS Misc Info
https://www.cyberciti.biz/faq/freebsd-linux-unix-zfs-automatic-mount-points-command/
http://fibrevillage.com/storage/168-zfs-pool-zfs-datasets-and-zfs-volumes
Mounts
ZFS file systems are automatically mounted at boot so there is no entry in fstab. To get a list of your zpools run:
zpool list
Disk Cache
If you come across a RAID controller that can be used in HBA mode but you can still control whether or not non-RAID disks have their disk caches enabled or disabled, enable them: https://serverfault.com/questions/995702/zfs-enable-or-disable-disk-cache
Quote from link: The rationale is that ZFS assumes enabled disk cache and so flushes any critical writes (ie: sync write and uberblock rewrite) via appropriate and specific SATA/SAS commands (ATA FLUSH, FUAs, etc).
ZFS Optimization
Here data is the pool name
sudo zfs set xattr=sa vg_images sudo zfs set acltype=posixacl vg_images sudo zfs set compression=lz4 vg_images sudo zfs set atime=off vg_images sudo zfs set relatime=off vg_images
It seems sub-datasets inherit the properties of their parent unless you specify otherwise.
TRIM on SSDs
To run trim:
sudo zpool trim <poolname>
To check trim status:
sudo zpool status -t <poolname>
To make it automatic
sudo zpool set autotrim=on <poolname>
Resilvering/Scrubbing
https://pthree.org/2012/12/11/zfs-administration-part-vi-scrub-and-resilver/
If you have 2 or more drives in an array then you can use scrubbing to verify and repair the data if it is corrupt, following commands are to start the scrub, check the status and stop it if needed
zpool scrub tank zpool status tank zpool scrub -s tank
You should do this weekly if you are using cheap drives like I am:
sudo vim /etc/crontab
Add:
0 2 * * 0 /sbin/zpool scrub tank
Info Commands
Get info about zpool
sudo zpool list -v <poolname>
Get even more info
sudo zpool get all <poolname>
It seems you get even more info if you query the dataset of a zfs pool… I saw all my options that were set at the pool level on the dataset but not on the pool.
Get device hardware info of zpools
zpool status -c vendor,model,size
Get info on datasets
sudo zfs list
Get info on space used by datasets and snapshots
https://www.thegeekdiary.com/how-to-find-the-space-consumed-by-zfs-snapshots/
zfs list -o space -r rpool
From this output, you can see the amount of space that is:
- Available on each file system
- Being used
- Being consumed by snapshots of each data set (USEDSNAP)
- Being used by the data set itself (USEDDS)
- Being used by a refreservation set on the data set (USED REFRESERV)
- Being used by the children of this data set (USEDCHILD)
ZFS Snapshot
https://briankoopman.com/zfs-automated-snapshots/
https://pthree.org/2012/12/19/zfs-administration-part-xii-snapshots-and-clones/
Finally, aside from the data integrity inherit to using ZFS it was the efficient snapshotting that I was after (well paired with send/receive).
Take a snapshot of tank
zfs snapshot tank/test@tuesday
List ZFS snapshots
zfs list -r -t snapshot tank
Destroy specific snapshot from tank/test (note, you can destroy any snapshot you want as long as it doesn't have a clone, snapshots are independent of other snapshots).
zfs destroy tank/test@2012:12:18:51:2:19:15
Destroy all snapshots from tank/test
sudo zfs destroy -r tank/test@%
Rollback a snapshot (note if you want to rollback a snapshot and there are newer snapshots than the one you are rolling back then the newer snapshot will be deleted, this is done via the -r switch)
zfs rollback tank/test@tuesday
And lets say you aren't sure which image you want to rollback or if you need to access multiple stages of the snapshot history you could clone and open the clone and inspect it before rolling back
zfs clone tank/test@tuesday tank/tuesday
My testing of snapshots for VM guest images… snapshots are all done with without notifying the the guest os so I'm guessing restoring will be like power was pulled.
- rolling back a running raw windows server=it looks fine at first, the newly install program that should have disappeared even started to open but then it bluescreened, rebooted, did a chkdsk and fixed a bunch of stuff then booted into Windows. I wouldn't trust this install anymore though. Bit it was really fun, I use to enjoy writing 0s to running Win98SE machines to see when they'd crash… this brings back those days. On login the familiar “unexpected shutdown” prompt came up. No trace of the application which had been downloaded and installed after snapshot create was found.
- snapshotting/rolling back a shutdown raw windows server=nothing special, no prompts and all looks good, it's just like backing up an offline image.
- using a clone to create a new copy of a raw windows server=fantastic, it was fast and the clone booted fast, windows did prompt for the unexplained shutdown so it was as if the power was pulled.
- snapshot of a clone works!
- all of this is quick and easy and lovely, what have I been doing all these years without ZFS??? MDRAID and XFS is still very good too and there are excellent use cases for them, but boy am I'm happy to have finally tried ZFS (yes, it's not as fast but my time isn't cheap either, frequent cheap backups and less time spent administering is very valuable, buy faster storage if performance and reliability/ease of administration is important to you).
Delete all Snapshots with a cold_backup in the Name
https://blandname.com/2012/04/09/clear-all-zfs-snapshots/
I use this to delete all cold backups (snapshots of powered off virtual guest disk images)
sudo zfs list -H -o name -t snapshot | grep cold_backup | xargs -n1 sudo zfs destroy
Create Snapshot with Specific Name for all Datasets
https://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/6mhupg6o2/index.html
I use this to create snapshots of powered off virtual guest disk images
sudo zfs snapshot -r vg_images@cold_backup
To help clarify what rolling back a snapshot does, from Oracle: You can use the zfs rollback command to discard all changes made to a file system since a specific snapshot was created. The file system reverts to its state at the time the snapshot was taken. (so at the time it was taken the other snapshots didn't exist, however you can delete a snapshot in the middle of the list of snapshots and it's ok. Does this mean if I rollback to a later snapshot than the deleted snapshot that the deleted snapshot will reappear??? No, it doesn't appear so, at least on the backup host where I use syncoid to replicate snapshots. When deleting a snapshot it seems to back the size of prior and subsequent snapshot increase in size, earlier or later than those seem to be unaffected.
ZFS Send/Receive
https://www.thegeekdiary.com/zfs-tutorials-creating-zfs-snapshot-and-clones/
We are already having fun, now it's time to be practical. Snapshots are easy, fast and beautiful in their operation, now lets see if send/receive is as endearing, though it's ultimately the most important feature of the set since it's a real backup (that is to a different data volume/computer/etc).
zfs send geekpool/fs1@oct2013 | ssh node02 "zfs receive testpool/testfs"
Non-root user
I prefer to use sudo for everything just for security and consistency. ZFS send/receive was a problem because of this. See the permissions section on how to deal with this.
ZFS Permissions/Access
https://docs.oracle.com/cd/E88353_01/html/E72487/zfs-allow-8.html
If you are going to use ZFS send/receive as a non-root user then you're going to need to setup permissions to do so in ZFS.
We are going add permissions to the “wheel” group for the zpool vg_images
sudo zfs allow -g wheel compression,clone,create,destroy,hold,promote,receive,rollback,send,snapshot,mount,mountpoint vg_images
To show permissions
sudo allow vg_images
Note: mount and unmount can't be delegated under ZFS for Linux though I was able to assign mount and this seemed necessary for me to “zfs destroy” of a snapshot where to create the snapshot didn't require this permission (09-28-2020, CentOS 8).
NOTE: info below on visudo doesn't appear to be necessary with the above “zfs allow” commands applied, leaving for now until firmly established.
If you want to run commands on non-root accounts without typing in a sudo password then modify the sudoers file to allow specific commands to be run without a password (be as granular and restrictive as possible and evaluate how it can be abused then determine if it's the right option).
sudo visudo
Add the following (https://github.com/jimsalterjrs/sanoid/issues/522)
# Syncoid commands pladmin ALL=(ALL) NOPASSWD:/sbin/zfs get * pladmin ALL=(ALL) NOPASSWD:/sbin/zfs snapshot * #pladmin ALL=(ALL) NOPASSWD:/sbin/zfs send * pladmin ALL=(ALL) NOPASSWD:/sbin/zfs receive * pladmin ALL=(ALL) NOPASSWD:/sbin/zfs rollback *@syncoid_cdn* pladmin ALL=(ALL) NOPASSWD:/sbin/zfs rollback *tank/video*@* # We only want to destroy snapshots pladmin ALL=(ALL) NOPASSWD:/sbin/zfs destroy *@syncoid_cdn* pladmin ALL=(ALL) NOPASSWD:/sbin/zfs destroy *tank/video*@*
Where:
pladmin - username on the remote (slave) server.
tank/video - the name of the ZFS pool and partition that are being synchronized
cdn - the beginning of the name hostname master server.
Example that worked with above sudoers file
sudo zfs send vg_images/SERVER2K19-TEST@cold_backup | ssh pladmin@172.18.18.234 "sudo zfs receive vg_images/SERVER2K19-TEST" Create users representative of each virtual host on backup server, use zfs allow to give permissions on each virtual servers backup dataset e.g: sudo zfs allow -u vhsrv01_backup_user clone,create,destroy,hold,promote,receive,rollback,send,snapshot vg_images/vg_backups/vhsrv01_vg_images to the responsible virtual host user, add no passwd option in sudoer file for needed zfs functions per user and limit it to each users representative dataset e.g: vhsrv01_backup_user ALL=(ALL) NOPASSWD:/sbin/zfs get *vg_images/vg_backups/vhsrv01_vg_images* vhsrv01_backup_user ALL=(ALL) NOPASSWD:/sbin/zfs snapshot *vg_images/vg_backups/vhsrv01_vg_images* vhsrv01_backup_user ALL=(ALL) NOPASSWD:/sbin/zfs receive *vg_images/vg_backups/vhsrv01_vg_images* vhsrv01_backup_user ALL=(ALL) NOPASSWD:/sbin/zfs rollback *vg_images/vg_backups/vhsrv01_vg_images* vhsrv01_backup_user ALL=(ALL) NOPASSWD:/sbin/zfs destroy *vg_images/vg_backups/vhsrv01_vg_images*
Think about reversing the send/receive command so that it is the backup server that receives then sshs into the source server and initiates a send, this way the backups cant be destroyed by the source server if it's compromised.
ZFS Snapshot + ZFS Send/Receive Example
This is for virtual guests on a KVM/libvirt host
sudo virsh shutdown ROOT-CA sudo virsh list (until root-ca is shutdown) sudo zfs snapshot vhsrv03_vg_images/root-ca@cold_backup sudo zfs send vhsrv03_vg_images/root-ca@cold_backup | ssh vundermin@172.18.18.171 "sudo zfs receive vhsrv01_vg_images/root-ca"
Recovery
https://docs.oracle.com/cd/E23823_01/html/819-5461/gbbwc.html
https://github.com/xuanngo2001/cust-live-deb/issues/298
https://github.com/openzfs/zfs/issues/10667
There are no “undelete” or scanning for deleted files utilities that I've tried, though some are listed and since ZFS is CoW maybe they work…
I had an issue with a power outage that lasted long enough to drain the 10 year old battery on a RAID controller (please laugh for reasons you might know). CentOS didn't boot and both /boot and / filesystems had to be repaired. ZFS wouldn't come up at all, the OS would stop booting at the point “a start job is running for import zfs pools by cache file”. Booting to recovery then moving/deleting /etc/zfs/zpool.cache allowed the system to boot, but no zpool was listed. When doing zpool import it would fail.
The following looked like it might work but caused a kernel panic after a while (import pool without cache file):
zpool import -FfmX pool5
The following allowed me to access it in readonly mode and actually recover data!
zpool import -o readonly=on -f pool0