Cold Backup Script from ZFS Snapshots

Install requirements (assuming ZFS is installed already and virtual guests are stored in their own dataset on ZFS)

sudo dnf install epel-release && sudo dnf install pigz mbuffer tar

There is an important bug to remember, REMOVE all virtual CD Drives from the virtual guests before running this script. If an empty CD drive is attached to a guest, the script will delete all snapshots made when it attempts to clean old backups, this will result in empty backup images. A fix needs to be added so it will ignore empty/unexpected disk image configurations… or something…

cold_backup.sh
#!/bin/bash -vx
 
# https://github.com/vacri/operations/blob/master/kvm-backup
# https://unix.stackexchange.com/questions/250740/replace-string-after-last-dot-in-bash
# https://stackoverflow.com/questions/29761201/delete-everything-before-last-in-bash
# https://stackoverflow.com/questions/4181703/how-to-concatenate-string-variables-in-bash
# https://serverfault.com/questions/340837/how-to-delete-all-but-last-n-zfs-snapshots
#
# This script shuts down virtual guests (it's helpful for Windows guests to have the qemu-agent + virtio serial device and driver installed + channel qemu-ga; this ensures proper shutdown).
# This version of the script uses pigz for compression and mbuffer to buffer...
#
# Install pigz/mbuffer/tar via "sudo dnf install epel-release && sudo dnf install pigz mbuffer tar"
#
# This script was modified to backup ZFS snapshots (it does not make said snapshot). The assumption is that you'll have a ZFS dataset per
# virtual guest or per disk image and that you'll already have made a snapshot of the virtual guest in a powered off state to a specific dir (we use
# .zfs/snapshot/cold_backup) so our full path assuming you use /var/lib/libvirt/images/virtual-guest-name/virtual-guest-name.img will be:
# /var/lib/libvirt/images/virtual-guest-name/.zfs/snapshot/cold_backup/virtual-guest-name.img
#
# Datasets per virtual guest or per virtual disk image are useful in that they allow separate snapshots per guest or per guest image.
#
# This is who we'll back up, put each virtual guest name separated by spaces
#
machines="HT-SUB-CA"
 
#
# This is a list of disk images that we don't want backed up, for instance if a disk image name contains BACKUP or BITLOCKER and you don't want those disk images backed up then put BACKUP and BITLOCKER here, separated by spaces, grep will be used to get the full disk image name so a partial match here is fine
#
exclude_disks="BACKUP BITLOCKER"
#
 
#
# Keep the house clean, put in days how old the previous backups need to be before they're deleted.
#
days_to_keep="20"
 
#
# Store backups here. NFS to another machine would make sense. 
# so that your backups are not local, or to a separate local set of disks if you are using zfs send/receive to another host already
#
 
backup_dir="/vg_backups/VHSRV01"
 
#
# Logs go here.
#
log_dir="/vg_backups/logs"
 
#
# ZFS zpool to create snapshots from
zpools="vhsrv01_vg_images vhsrv01_vg_backups/ht-logger01"
#
# ZFS snapshot snapshot name. If you make a snapshot by running zfs snapshot -r vg_images@cold_backup then cold_backup is the snapshot name
zfs_snapshot_name="cold_backup"
#
# How many snapshots do you want to keep? This will keep the last # of snapshots listed here, not the date. So if you take one a week and want to keep a years worth then put 53.
#
keep_snapshots="2"
#
# End parameters
#
#=================================================================
#
# Timestamp for the log file
#
right_now=`date '+%m%d%Y_%H%M%p'`
 
exec 1>/${log_dir}/backup_vms.${right_now}.log 2>&1
 
print_date() {
   date '+%m%d%Y_%H%M%p'
}
 
zfs_disk_snapshot="/.zfs/snapshot/${zfs_snapshot_name}"
# List all running virtual guest names only and create blank variable
machines_running=$(virsh list --name --state-running)
blank=
machines_to_start=$machines_running
 
# Create variable for disk images to exclude
if [[ "$exclude_disks" != "$blank" ]]
then
        excluded_disk_images=$(sed -r 's/([^ ]+)/-e \1/g' <<<"$exclude_disks")
else
        excluded_disk_images=ThereAreNoneSilly
fi
echo "Disk images containing the following names will be exluded: $excluded_disk_images"
 
# Add 1 to keep_snapshots variable since variable is fed into tail -n +$ for the purpose of deleting all snapshots after $, well $ is the starting line, we want the line after the $ variable
keep_snapshots=$((keep_snapshots+1))
 
# Get number of CPUs in system to limit use for pigz to all but 1 core
system_cpus=$(grep -c ^processor /proc/cpuinfo)
pigz_cpus=$((system_cpus-1))
if (( "$pigz_cpus" == "0" )); then pigz_cpus=1; fi
 
echo "The number system CPUs is: $system_cpus"
echo "The number of CPUs to be used by pigz is: $pigz_cpus" 
 
# Shutdown running virtual guests pausing 5 seconds between each shutdown
for machine_running in $machines_running; do virsh shutdown $machine_running; sleep 5; done
 
# Check every 20 seconds if there are running machines, do this for 120 seconds total before moving to next step.
echo -e "Waiting for guests to shutdown...\n"
for I in 1 2 3 4 5 6
do
        machines_running=$(virsh list --name --state-running)
  if [[ "$machines_running" = "$blank" ]]
  then
          break
  fi
  echo -e "The following guests are still running:"
  echo -e "$machines_running\n"
  sleep 20
done
 
# Check to see if virtual guests are still running and shut them down using alternate method
machines_running=$(virsh list --name --state-running)
if [[ "$machines_running" != "$blank" ]]
then
    echo -e "Shutting down using qemu-agent\n"
    for machine_running in $machines_running; do virsh shutdown --mode=agent $machine_running; sleep 5;
    done
fi
 
# Check every 20 seconds if there are running machines, do this for 120 seconds total before moving to next step.
for I in 1 2 3 4 5 6
do
        machines_running=$(virsh list --name)
  if [[ "$machines_running" = "$blank" ]]
  then
          echo -e "All guests are shutdown"
          break
  fi
  echo -e "The following guests are still running:"
  echo -e "$machines_running\n"
  sleep 20
done
 
 
#
# Destroy old snapshots
#
#existing_snapshots=$(zfs list -H -o name -t snapshot | grep $zfs_snapshot_name)
#if [[ "$existing_snapshots" != "$blank" ]]
#then
#echo "Destroying existing snapshots: $existing_snapshots"
#zfs list -H -o name -t snapshot | grep $zfs_snapshot_name | xargs -n1 sudo zfs destroy
#fi
 
#
# Create new snapshots
#
 
snapshot_timestamp=`date '+%m%d%Y_%H%M%p'`
 
for zpool in $zpools
do
echo "Creating recursive snapshops: $zpool@${zfs_snapshot_name}-${snapshot_timestamp}"
zfs snapshot -r "$zpool@${zfs_snapshot_name}-${snapshot_timestamp}"
done
 
#
# Power on virtual guests that were shut off for snapshot
#
echo -e "Starting virtual guests after snapshot creation...\n"
for machine_to_start in $machines_to_start; do virsh start $machine_to_start; sleep 15;
done
 
### Pause script while guests are still starting up before starting backups
sleep 60
 
for machine in $machines
do
   if [[ ! -d ${backup_dir}/${machine} ]];
   then
      mkdir -p ${backup_dir}/${machine}
   fi
 
   echo "Backing up VM configuration"
         if [ -f ${backup_dir}/${machine}/${machine}.xml ];
         then
         file_date=`date -r ${backup_dir}/${machine}/${machine}.xml '+%m%d%Y_%H%M%p'`
         mv -n "${backup_dir}/${machine}/${machine}.xml" "${backup_dir}/${machine}/${machine}.xml.$file_date"
      else
         echo "   "${backup_dir}/${machine}/${machine}.xml" does not exist"
      fi
 
   virsh dumpxml $machine > ${backup_dir}/${machine}/${machine}.xml
 
 
   echo "Copying disk(s)"
   virsh domblklist $machine | grep -v "^$" | grep -v "^Target" | grep -v "\-\-\-\-\-" | grep -v "Source" | grep -v $excluded_disk_images | awk '{print $2}' | while read disk
   do
      echo "This is the full path and disk image: $disk"
      ###This removes the last / in the string along with anything after it
      disk_root="${disk%/*}"
      echo "This is the virtual disk image root path: $disk_root"
      ###This removes the last / in the string along with everything before it
      disk_image="${disk##*/}"
      echo "This is the disk image name: $disk_image"
      echo "This is the zfs disk snapshot: $zfs_disk_snapshot"
      ###This concantenates all variables together
      disk="${disk_root}${zfs_disk_snapshot}-${snapshot_timestamp}/${disk_image}"
      echo "This is the snapshot disk image path: $disk"
      copy_disk="${backup_dir}/${machine}/`basename ${disk}`.tar.gz"
      ###This finds the zfs path of the $disk_root
      ###This will be used in the future in case we want to snapshot per machine instead of all machines
      zfs_root_path=$(zfs list | grep $disk_root | awk '{print $1}')
      echo "This is the ZFS root path: $zfs_root_path"
      ###Rename old backup archive with creation date appended to the end of the file name, we do this so that the "current" image will always have the same name for rsync purposes and the older images are retained and dated
      if [ -f $copy_disk ];     
         then
         file_date=`date -r $copy_disk '+%m%d%Y_%H%M%p'`         
         mv -n "$copy_disk" "$copy_disk.$file_date"
      else
         echo "   "$copy_disk" does not exist"
      fi
      echo "   Copying $disk to $copy_disk"
      fuser $disk 1>/dev/null 2>&1
      if (( $? == 0 ))
      then
         echo "   Disk $disk is still in use! "
         copy_disk="${copy_disk}.unclean"
      else
         echo "   Copy started at `print_date`"
         tar -c $disk | mbuffer | pigz -1 --rsyncable -p${pigz_cpus} | mbuffer > $copy_disk
         echo "   Return code: $?"
         echo "   Copy ended at `print_date`"
      fi
      existing_snapshots=$(zfs list -H -o name -t snapshot -S creation | grep "${zfs_root_path}@${zfs_snapshot_name}" | xargs -n 1 echo)
      echo "These are the existing $zfs_snapshot_name snapshots: $existing_snapshots"
      snapshots_to_destroy=$(zfs list -H -o name -t snapshot -S creation | grep "${zfs_root_path}@${zfs_snapshot_name}" | tail -n +$keep_snapshots | xargs -n 1 echo)
      echo "These are the snapshots to be destroyed: $snapshots_to_destroy"
      zfs list -H -o name -t snapshot -S creation | grep "${zfs_root_path}@${zfs_snapshot_name}" | tail -n +$keep_snapshots | xargs -n 1 zfs destroy -vr
   done
done
 
   echo "Removing old backups."
   find $backup_dir -type f -mtime +$days_to_keep -ls
   find $backup_dir -type f -mtime +$days_to_keep -exec rm -f {} \;

Note: if using the root crontab -e option, add the following path to the top of the crontab

PATH=/sbin:/bin:/usr/sbin:/usr/bin
# For details see man 4 crontabs

# Example of job definition:
# .---------------- minute (0 - 59)
# |  .------------- hour (0 - 23)
# |  |  .---------- day of month (1 - 31)
# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |  |  |  |  |
# *  *  *  *  * user-name  command to be executed
0 2 * * 0 /root/scripts/cold_backups_zfs_snapshots.sh

Todo: