Automated Linux Backups with Duplicity, Boto, and AWS S3
We are going to setup incremental, automated backups of our server today.
Overview
We are going to do this using Duplicity and Boto which will allow us to upload encrypted, incremental backups to S3.
Duplicity creates incremental backups which are encrypted using GnuPG and supports sending them to a remote server. It also supports a variety of network backends which make it quite flexible. One of these is Boto, which is a Python client library for Amazon Web Services including S3 which is where we will be storing our backups.
Install required software
On CentOS, you can install the required packages for duplicity, GnuPG, and Boto like this with yum.
yum install duplicity
yum install gnupg2
yum install python-boto
You can also download and install Duplicity from source from the duplicity website.
Setup your encryption key
If you do not have a GPG key already which you plan to use, you need to create a GnuPG key for encrypting your backups using the gpg command. You must set a passhrase on the key, and you will need this passphrase in order to use the key later. The default options for the command should be fine.
Be certain to never lose your key or passphrase or your backups will be very little use to you!
gpg --gen-key
Automate backup of selected directories
We need to setup a rotation for when we perform a full backup and when we perform incremental backups. By default duplicity will perform incremental backups if there is already one at the destination, but we want to perform a full backup at regular intervals.
For this reason we specify the option --full-if-older-than 2W
which will cause duplicity to do a full backup once every two weeks.
You will need your AWS access key and secret key. You can find how to get your AWS credentials from my article on using S3 for MySQL backups.
Be sure and set the file permissions on this script so that only you (i.e. root) has read access so others cannot get your access credentials from them.
#!/bin/sh
# Set pass phrase for GnuPG key that you created earlier.
export PASSPHRASE="put your passphrase here"
# These are your AWS credentials
export AWS_ACCESS_KEY_ID=YourAWSAccessKey
export AWS_SECRET_ACCESS_KEY=YourAWSSecretKey
# Make sure duplicity is not already running and if so exit.
if [ `ps aux | grep -v "grep" | grep --count "duplicity"` -gt 0 ]; then
echo "duplicity is already running!"
exit 1
fi
BUCKET=mybucketname # Put your bucket name here
KEYPRINT=ABC78DC7 # Put your gpg key here, you can find it with gpg --list-sigs
SRC=$1
if [ -z "$SRC" ]; then
echo "You must specify the source directory"; exit 1
fi
DSTDIR=$2
if [ -z "$DSTDIR" ]; then
echo "You must specify the destination directory"; exit 1
fi
DST="s3+http://$BUCKET/$DSTDIR"
OPT="--encrypt-key=$KEYPRINT --full-if-older-than 2W --s3-use-new-style --log-file=/var/log/duplicity"
EXTRA=$3
if [ -n "$EXTRA" ]; then
OPT="$OPT $EXTRA"
fi
duplicity $OPT $SRC $DST
# clear the sensitive data out of the environment
export PASSPHRASE=
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
In order to make sure this runs daily, we add a crontab entry.
Start crontab editor:
sudo crontab -e
Add a line similar to this to your crontab.
1 \* \* \* \* /opt/bin/backupToS3.py /localfolder folderInS3Bucket > /dev/null 2>&1
This tells it to run every day at 1 AM.
Another possibility you may want to look into with duplicity is the option to run it with the remove-older-than
option to remove backups older than a certain period of time.
Restore from backup
Of course a backup is not much use if you can restore the files and data it contains. Here are some examples of different ways you can restore part or all of the data from your backkups.
You can restore a single file. This command restores the relative path “path/to/myfile.txt” from the specified backup into /usr/local/restored_files
duplicity --file-to-restore path/to/myfile.txt s3+http://mybucket/backups /usr/local/restored_files
You can also restore an entire directory.
duplicity --file-to-restore path/to/mydir s3+http://mybucket/backups /usr/local/restored_files
And you can restore to a specific point in time, either relative or absolute.
Restore the directory from three days prior.
duplicity -t 3D --file-to-restore path/to/mydir s3+http://mybucket/backups /usr/local/restored_files
Restore the directory as it was on noon of August 30, 2012 UTC.
duplicity -t 2012-08-30T12:00:00 --file-to-restore path/to/mydir s3+http://mybucket/backups /usr/local/restored_files
And of course you can just restore everything from the most recent version.
duplicity s3+http://mybucket/backups /usr/local/restored_files
You can also use the verify action to find out what has changed since the backup.
duplicity verify s3+http://mybucket/backups /home
This script makes restoring the current version of a file or directory more convenient.
#!/bin/sh
export PASSPHRASE="put your passphrase here"
export AWS_ACCESS_KEY_ID=YourAWSAccessKey
export AWS_SECRET_ACCESS_KEY=YourAWSSecretKey
BUCKET=mybucketname # Put your bucket name here
KEYPRINT=ABC78DC7 # Put your gpg key here
SRCDIR=$1
if [ -z "$SRCDIR" ]; then
echo "You must specify the source directory"; exit 1
fi
DST=$2
if [ -z "$DST" ]; then
echo "You must specify the destination directory"; exit 1
fi
SRC="s3+http://$BUCKET/$SRCDIR"
duplicity restore --encrypt-key=$KEYPRINT --s3-use-new-style $SRC $DST
# clear the sensitive data out of the environment
export PASSPHRASE=
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=