tech_documents:linux:zfs

ZFS on CentOS 8
Create Zpool with Free Space
Create Zpool with Whole Drive
ZFS Misc Info
ZFS Optimization
TRIM on SSDs
Resilvering/Scrubbing
Info Commands
ZFS Snapshot
ZFS Send/Receive
ZFS Permissions/Access
Recovery

ZFS on CentOS 8

If you use ZFS there is a chance that updates to CentOS 8 will break the module on next boot, particularly on point releases. Be sure if updating to a new point release that support for the release is available via an updated repo from the ZFS site listed below. KEEP THIS IN MIND, otherwise you'll wonder why all your ZFS volumes disappeared after an update and you can't seem to get them back… To get them back install the latest ZFS release for the ZFS on Linux site, do a “sudo dnf remove zfs” then a “sudo dnf install zfs” and reboot…

https://openzfs.github.io/openzfs-docs/Getting Started/RHEL and CentOS.html

sudo dnf install epel-release wget
sudo dnf install https://zfsonlinux.org/epel/zfs-release-2-2$(rpm --eval "%{dist}").noarch.rpm
sudo dnf install epel-release
sudo dnf install kernel-devel
sudo dnf install zfs

Install:

sudo dnf install zfs

Limit the amount of RAM it uses:

sudo vim /etc/modprobe.d/zfs.conf

Add:

# Min 2048MB / Max 4096 MB Limit
options zfs zfs_arc_min=2147483648
options zfs zfs_arc_max=4294967296

Load ZFS module

sudo /sbin/modprobe zfs

Create Zpool with Free Space

https://wiki.archlinux.org/index.php/ZFS

In this setup we are going to use the free space on each drive for ZFS.

Get your storage device info

sudo lsblk

Create paritions

sudo parted -a optimal /dev/sda
print free

Take the Start and End Values of your Free Space and use them to create the parition

mkpart primary 72.0GB 500GB
print
quit

Repeat for each disk that will have free space added to the zpool. Run

lsblk

to get a list of the partitions that will be used in the zpool.

Create the zpool (here were are using spinning disks and mirroring them)

sudo zpool create -f -o ashift=12 -m /mnt/data bigdata mirror /dev/sda4 /dev/sdb4

Or the equivilent of RAID10 with 4 disks

sudo zpool create -f -o ashift=12 -m /mnt/data bigdata mirror /dev/sda4 /dev/sdb4 mirror /dev/sdc4 /dev/sdd4

https://www.reddit.com/r/zfs/comments/514k2r/kvm_zfs_best_practices/ Use ashift=13 on any Samsung SSD 850 era and newer.

Note: when I did this parted show a 428GB partition but df -h shows 386GB for the mounted zfs volume… strange. Well lsblk shows a 398.7G parition… oh well for now, it's late.

Create Zpool with Whole Drive

https://www.svennd.be/create-a-zfs-mirror-pool/

In this setup we are going to use whole drives for ZFS.

Get your storage device info

sudo lsblk

Create the zpool (here were are using spinning disks and mirroring them)

sudo zpool create -f -o ashift=12 -m /mnt/data bigdata mirror /dev/sdc /dev/sdd

https://www.reddit.com/r/zfs/comments/514k2r/kvm_zfs_best_practices/ Use ashift=13 on any Samsung SSD 850 era and newer.

Create datasets in the ZFS pool (we use separate datasets per vm image for snapshot purposes)

sudo zfs create bigdata/vm_guest_name

If using datasets per VM then create the dataset first then the VM in the dataset, moving it afterward is like moving between real partitions and will take a while. Also, create RAW images via qemu-img instead of virt-gui as it defaults to falloc allocation which will take forever, use this instead:

sudo qemu-img create -f raw TEST.img 50G -o preallocation=off

ZFS Misc Info

https://www.cyberciti.biz/faq/freebsd-linux-unix-zfs-automatic-mount-points-command/
http://fibrevillage.com/storage/168-zfs-pool-zfs-datasets-and-zfs-volumes

Mounts

ZFS file systems are automatically mounted at boot so there is no entry in fstab. To get a list of your zpools run:

zpool list

Disk Cache

If you come across a RAID controller that can be used in HBA mode but you can still control whether or not non-RAID disks have their disk caches enabled or disabled, enable them: https://serverfault.com/questions/995702/zfs-enable-or-disable-disk-cache

Quote from link: The rationale is that ZFS assumes enabled disk cache and so flushes any critical writes (ie: sync write and uberblock rewrite) via appropriate and specific SATA/SAS commands (ATA FLUSH, FUAs, etc).

ZFS Optimization

Here data is the pool name

sudo zfs set xattr=sa vg_images
sudo zfs set acltype=posixacl vg_images
sudo zfs set compression=lz4 vg_images
sudo zfs set atime=off vg_images
sudo zfs set relatime=off vg_images

It seems sub-datasets inherit the properties of their parent unless you specify otherwise.

TRIM on SSDs

To run trim:

sudo zpool trim <poolname>

To check trim status:

sudo zpool status -t <poolname>

To make it automatic

sudo zpool set autotrim=on <poolname>

Resilvering/Scrubbing

https://pthree.org/2012/12/11/zfs-administration-part-vi-scrub-and-resilver/

If you have 2 or more drives in an array then you can use scrubbing to verify and repair the data if it is corrupt, following commands are to start the scrub, check the status and stop it if needed

zpool scrub tank
zpool status tank
zpool scrub -s tank

You should do this weekly if you are using cheap drives like I am:

sudo vim /etc/crontab

Add:

0 2 * * 0 /sbin/zpool scrub tank

Info Commands

Get info about zpool

sudo zpool list -v <poolname>

Get even more info

sudo zpool get all <poolname>

It seems you get even more info if you query the dataset of a zfs pool… I saw all my options that were set at the pool level on the dataset but not on the pool.

Get device hardware info of zpools

zpool status -c vendor,model,size

Get info on datasets

sudo zfs list

Get info on space used by datasets and snapshots
https://www.thegeekdiary.com/how-to-find-the-space-consumed-by-zfs-snapshots/

zfs list -o space -r rpool

From this output, you can see the amount of space that is:

Available on each file system
Being used
Being consumed by snapshots of each data set (USEDSNAP)
Being used by the data set itself (USEDDS)
Being used by a refreservation set on the data set (USED REFRESERV)
Being used by the children of this data set (USEDCHILD)

ZFS Snapshot

https://briankoopman.com/zfs-automated-snapshots/
https://pthree.org/2012/12/19/zfs-administration-part-xii-snapshots-and-clones/

Finally, aside from the data integrity inherit to using ZFS it was the efficient snapshotting that I was after (well paired with send/receive).

Take a snapshot of tank

zfs snapshot tank/test@tuesday

List ZFS snapshots

zfs list -r -t snapshot tank

Destroy specific snapshot from tank/test (note, you can destroy any snapshot you want as long as it doesn't have a clone, snapshots are independent of other snapshots).

zfs destroy tank/test@2012:12:18:51:2:19:15

Destroy all snapshots from tank/test

sudo zfs destroy -r tank/test@%

Rollback a snapshot (note if you want to rollback a snapshot and there are newer snapshots than the one you are rolling back then the newer snapshot will be deleted, this is done via the -r switch)

zfs rollback tank/test@tuesday

And lets say you aren't sure which image you want to rollback or if you need to access multiple stages of the snapshot history you could clone and open the clone and inspect it before rolling back

zfs clone tank/test@tuesday tank/tuesday

My testing of snapshots for VM guest images… snapshots are all done with without notifying the the guest os so I'm guessing restoring will be like power was pulled.

rolling back a running raw windows server=it looks fine at first, the newly install program that should have disappeared even started to open but then it bluescreened, rebooted, did a chkdsk and fixed a bunch of stuff then booted into Windows. I wouldn't trust this install anymore though. Bit it was really fun, I use to enjoy writing 0s to running Win98SE machines to see when they'd crash… this brings back those days. On login the familiar “unexpected shutdown” prompt came up. No trace of the application which had been downloaded and installed after snapshot create was found.
snapshotting/rolling back a shutdown raw windows server=nothing special, no prompts and all looks good, it's just like backing up an offline image.
using a clone to create a new copy of a raw windows server=fantastic, it was fast and the clone booted fast, windows did prompt for the unexplained shutdown so it was as if the power was pulled.
snapshot of a clone works!
all of this is quick and easy and lovely, what have I been doing all these years without ZFS??? MDRAID and XFS is still very good too and there are excellent use cases for them, but boy am I'm happy to have finally tried ZFS (yes, it's not as fast but my time isn't cheap either, frequent cheap backups and less time spent administering is very valuable, buy faster storage if performance and reliability/ease of administration is important to you).

Delete all Snapshots with a cold_backup in the Name

https://blandname.com/2012/04/09/clear-all-zfs-snapshots/

I use this to delete all cold backups (snapshots of powered off virtual guest disk images)

sudo zfs list -H -o name -t snapshot | grep cold_backup | xargs -n1 sudo zfs destroy

Create Snapshot with Specific Name for all Datasets

https://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/6mhupg6o2/index.html

I use this to create snapshots of powered off virtual guest disk images

sudo zfs snapshot -r vg_images@cold_backup

To help clarify what rolling back a snapshot does, from Oracle: You can use the zfs rollback command to discard all changes made to a file system since a specific snapshot was created. The file system reverts to its state at the time the snapshot was taken. (so at the time it was taken the other snapshots didn't exist, however you can delete a snapshot in the middle of the list of snapshots and it's ok. Does this mean if I rollback to a later snapshot than the deleted snapshot that the deleted snapshot will reappear??? No, it doesn't appear so, at least on the backup host where I use syncoid to replicate snapshots. When deleting a snapshot it seems to back the size of prior and subsequent snapshot increase in size, earlier or later than those seem to be unaffected.

ZFS Send/Receive

https://www.thegeekdiary.com/zfs-tutorials-creating-zfs-snapshot-and-clones/

We are already having fun, now it's time to be practical. Snapshots are easy, fast and beautiful in their operation, now lets see if send/receive is as endearing, though it's ultimately the most important feature of the set since it's a real backup (that is to a different data volume/computer/etc).

zfs send geekpool/fs1@oct2013 | ssh node02 "zfs receive testpool/testfs"

Non-root user

I prefer to use sudo for everything just for security and consistency. ZFS send/receive was a problem because of this. See the permissions section on how to deal with this.

ZFS Permissions/Access

https://docs.oracle.com/cd/E88353_01/html/E72487/zfs-allow-8.html

If you are going to use ZFS send/receive as a non-root user then you're going to need to setup permissions to do so in ZFS.

We are going add permissions to the “wheel” group for the zpool vg_images

sudo zfs allow -g wheel compression,clone,create,destroy,hold,promote,receive,rollback,send,snapshot,mount,mountpoint vg_images

To show permissions

sudo allow vg_images

Note: mount and unmount can't be delegated under ZFS for Linux though I was able to assign mount and this seemed necessary for me to “zfs destroy” of a snapshot where to create the snapshot didn't require this permission (09-28-2020, CentOS 8).

NOTE: info below on visudo doesn't appear to be necessary with the above “zfs allow” commands applied, leaving for now until firmly established.

If you want to run commands on non-root accounts without typing in a sudo password then modify the sudoers file to allow specific commands to be run without a password (be as granular and restrictive as possible and evaluate how it can be abused then determine if it's the right option).

sudo visudo

Add the following (https://github.com/jimsalterjrs/sanoid/issues/522)

# Syncoid commands
pladmin ALL=(ALL) NOPASSWD:/sbin/zfs get *
pladmin ALL=(ALL) NOPASSWD:/sbin/zfs snapshot *
#pladmin ALL=(ALL) NOPASSWD:/sbin/zfs send *
pladmin ALL=(ALL) NOPASSWD:/sbin/zfs receive *
pladmin ALL=(ALL) NOPASSWD:/sbin/zfs rollback *@syncoid_cdn*
pladmin ALL=(ALL) NOPASSWD:/sbin/zfs rollback *tank/video*@*
# We only want to destroy snapshots
pladmin ALL=(ALL) NOPASSWD:/sbin/zfs destroy *@syncoid_cdn*
pladmin ALL=(ALL) NOPASSWD:/sbin/zfs destroy *tank/video*@*

Where:
pladmin - username on the remote (slave) server.
tank/video - the name of the ZFS pool and partition that are being synchronized
cdn - the beginning of the name hostname master server.

Example that worked with above sudoers file

sudo zfs send vg_images/SERVER2K19-TEST@cold_backup | ssh pladmin@172.18.18.234 "sudo zfs receive vg_images/SERVER2K19-TEST"


Create users representative of each virtual host on backup server, use zfs allow to give permissions on each virtual servers backup dataset e.g: 
sudo zfs allow -u vhsrv01_backup_user clone,create,destroy,hold,promote,receive,rollback,send,snapshot vg_images/vg_backups/vhsrv01_vg_images 
to the responsible virtual host user, add no passwd option in sudoer file for needed zfs functions per user and limit it to each users representative dataset e.g:
vhsrv01_backup_user ALL=(ALL) NOPASSWD:/sbin/zfs get *vg_images/vg_backups/vhsrv01_vg_images*
vhsrv01_backup_user ALL=(ALL) NOPASSWD:/sbin/zfs snapshot *vg_images/vg_backups/vhsrv01_vg_images*
vhsrv01_backup_user ALL=(ALL) NOPASSWD:/sbin/zfs receive *vg_images/vg_backups/vhsrv01_vg_images*
vhsrv01_backup_user ALL=(ALL) NOPASSWD:/sbin/zfs rollback *vg_images/vg_backups/vhsrv01_vg_images*
vhsrv01_backup_user ALL=(ALL) NOPASSWD:/sbin/zfs destroy *vg_images/vg_backups/vhsrv01_vg_images*

Think about reversing the send/receive command so that it is the backup server that receives then sshs into the source server and initiates a send, this way the backups cant be destroyed by the source server if it's compromised.

ZFS Snapshot + ZFS Send/Receive Example

This is for virtual guests on a KVM/libvirt host

sudo virsh shutdown ROOT-CA
sudo virsh list (until root-ca is shutdown)
sudo zfs snapshot vhsrv03_vg_images/root-ca@cold_backup
sudo zfs send vhsrv03_vg_images/root-ca@cold_backup | ssh vundermin@172.18.18.171 "sudo zfs receive vhsrv01_vg_images/root-ca"

Recovery

https://docs.oracle.com/cd/E23823_01/html/819-5461/gbbwc.html
https://github.com/xuanngo2001/cust-live-deb/issues/298
https://github.com/openzfs/zfs/issues/10667

There are no “undelete” or scanning for deleted files utilities that I've tried, though some are listed and since ZFS is CoW maybe they work…

I had an issue with a power outage that lasted long enough to drain the 10 year old battery on a RAID controller (please laugh for reasons you might know). CentOS didn't boot and both /boot and / filesystems had to be repaired. ZFS wouldn't come up at all, the OS would stop booting at the point “a start job is running for import zfs pools by cache file”. Booting to recovery then moving/deleting /etc/zfs/zpool.cache allowed the system to boot, but no zpool was listed. When doing zpool import it would fail.

The following looked like it might work but caused a kernel panic after a while (import pool without cache file):

zpool import -FfmX pool5

The following allowed me to access it in readonly mode and actually recover data!

zpool import -o readonly=on -f pool0

Table of Contents