tech_documents:linux:zfs_sanoid_syncoid [He holds the keys]

https://github.com/jimsalterjrs/sanoid/blob/master/INSTALL.md#centos

Sanoid/Syncoid is a policy based zfs snapshot and zfs send/receive system. It's supposed to make taking frequent snapshots simple and manageable and automated with the added ability to backup/replicate said snapshots to other zpools.

Install

Install prerequisites:

sudo dnf install -y epel-release git
sudo dnf config-manager --set-enabled PowerTools (sometimes using powertools instead of PowerTools works...)
sudo dnf install -y perl-Config-IniFiles perl-Data-Dumper perl-Capture-Tiny lzop mbuffer mhash pv

Download files:

sudo git clone https://github.com/jimsalterjrs/sanoid.git
cd sanoid
sudo git checkout $(git tag | grep "^v" | tail -n 1)
sudo cp sanoid syncoid findoid sleepymutex /usr/local/sbin
sudo mkdir /etc/sanoid
sudo cp sanoid.defaults.conf /etc/sanoid
sudo touch /etc/sanoid/sanoid.conf
sudo cp sanoid.conf /etc/sanoid/sanoid.example.conf

Create systemd service:

cat << "EOF" | sudo tee /etc/systemd/system/sanoid.service
[Unit]
Description=Snapshot ZFS Pool
Requires=zfs.target
After=zfs.target
Wants=sanoid-prune.service
Before=sanoid-prune.service
ConditionFileNotEmpty=/etc/sanoid/sanoid.conf

[Service]
Environment=TZ=UTC
Type=oneshot
ExecStart=/usr/local/sbin/sanoid --take-snapshots --verbose
EOF

Create prune service:

cat << "EOF" | sudo tee /etc/systemd/system/sanoid-prune.service
[Unit]
Description=Cleanup ZFS Pool
Requires=zfs.target
After=zfs.target sanoid.service
ConditionFileNotEmpty=/etc/sanoid/sanoid.conf

[Service]
Environment=TZ=UTC
Type=oneshot
ExecStart=/usr/local/sbin/sanoid --prune-snapshots --verbose

[Install]
WantedBy=sanoid.service
EOF

Create a systemd timer:

cat << "EOF" | sudo tee /etc/systemd/system/sanoid.timer
[Unit]
Description=Run Sanoid Every 15 Minutes
Requires=sanoid.service

[Timer]
OnCalendar=*:0/15
Persistent=true

[Install]
WantedBy=timers.target
EOF

Edit the sanoid.conf file. Here we will be making snapshots of all child datasets of vg_images

sudo vim /etc/sanoid/sanoid.conf

Add the following (if you are using syncoid as well then on the destination/backup server change the template from “production” to “backup”: (note: if you have Windows Servers, especially a WSUS server or file server then they will generate 2GB+ snapshops at regular intervals, you may want to look at reducing the # of retained snapshots for those servers, or find a way to prevent them from writing too much to the drive, I don't know if it's just swap activity or not)

# you can also handle datasets recursively.
[vg_images]
        use_template = production
        recursive = yes
        # if you want sanoid to manage the child datasets but leave this one alone, set process_children_only.
        process_children_only = yes


#############################
# templates below this line #
#############################

# name your templates template_templatename. you can create your own, and use them in your module definitions above.

[template_production]
        frequently = 0
        hourly = 36
        daily = 30
        monthly = 3
        yearly = 0
        autosnap = yes
        autoprune = yes
        
[template_backup]
        autoprune = yes
        frequently = 0
        hourly = 30
        daily = 90
        monthly = 12
        yearly = 0

        ### don't take new snapshots - snapshots on backup
        ### datasets are replicated in from source, not
        ### generated locally
        autosnap = no

        ### monitor hourlies and dailies, but don't warn or
        ### crit until they're over 48h old, since replication
        ### is typically daily only
        hourly_warn = 2880
        hourly_crit = 3600
        daily_warn = 48
        daily_crit = 60

Reload systemd and enable/start services

sudo systemctl daemon-reload
sudo systemctl enable sanoid-prune.service
sudo systemctl enable sanoid.timer
sudo systemctl start sanoid.timer

Notes:

On the destination if you rollback an earlier snapshot which requires the removal of newer snapshots then on the next syncoid run the newer snapshots will be restored.
On the destination if you destroy a snapshot it will not be restored on the next syncoid run. However if you rollback to an even earlier snapshot than that/those which were destroyed then on the next syncoid run even the destroyed ones will be restored.
If you are using a static snapshot name and destroy it on the destination then it won't be restored on the next syncoid run. However if you recreate the snapshot on the source then run syncoid it will replicate the new snapshot to the destination.
If a snapshot exists on the destination already it can't be “overwritten”, it must be destroyed before the updated snapshot can be replicated.
So far randomly destroying snapshots on destination doesn't seem to affect other snapshots in that they can still be rolled back and the VM image works.
NOTE: manually deleting all snapshots on source can throw the destination out of sync and syncoid will refuse to delete old syncoid snaps, leaving them on the source and eating up all space. The only want to fix this is to delete the destination datasets then re-run syncoid which then do the initial full sync.
Setup monitoring for the syncoid script as it will continue to run but throw errors so your backups might not be valid
IDEA: setup syncoid script that destroys destination on error but renames existing dataset
IDEA: Email notification on error and a daily summary.