Saturday, October 10, 2009

PostgreSQL on Ubuntu on EC2: Backing It All Up

This post continues what I started with "PostgreSQL on Ubuntu on EC2: The Installation Guide". Once you have your PostgreSQL database server instance running, you will need to backup two different things: your database data, and the instance itself. The database data will be backed up using Elastic Block Storage (EBS) snapshots. Once we have the instance running the backups correctly, we will then create an Amazon Machine Image (AMI) that will allow you to launch a new instance to replace the database server in case it goes down.

Backing Up The Database
First, we need to connect to our database server instance via SSH using the ubuntu user.

We will need to install some dependencies to get our backup script to run:

sudo apt-get install build-essential
sudo apt-get install ruby1.8-dev
sudo apt-get install rubygems
sudo gem update --system

You will need to tweak RubyGems so that the update works correctly, as described here.

Now you can install Gemcutter, which is the new ultra cool repository for gems:

sudo gem install gemcutter
sudo gem tumble

Finally we are ready to install the Amazon EC2 rubygem:

sudo gem install amazon-ec2

Now we can create our backup script. Save this code into the ~/ directory under the name backup_database.rb. You will need to substitute the Amazon ACCESS_KEY_ID and SECRET_ACCESS_KEY, as well and entering the correct EBS volume for the DATABASE_VOLUME constant:

#!/usr/bin/env ruby

require 'rubygems'
require 'AWS'


puts "Starting database snapshot..."
ec2 = => ACCESS_KEY_ID, :secret_access_key => SECRET_ACCESS_KEY)
ec2.create_snapshot(:volume_id => DATABASE_VOLUME)
puts "Database snapshot completed."

Due to the finicky way that Ruby runs as part of a cron job, we are best off creating a shell script that then runs the Ruby backup script. Save this code into the ~/ directory under the name

cd /home/ubuntu
ruby /home/ubuntu/backup_database.rb

Don't forget to make the backup shell script executable:

chmod +x /home/ubuntu/

Now we just need to configure this script to run as part of a cron job, so that the backups take place automatically. The crontab command brings up the list of configured cron tasks for the current user:

crontab -e

This example crontab entry runs the backup daily at midnight, but you may want it to run more frequently:

0 0 * * * /home/ubuntu/

At this point, you should have a fully functional automated backup system. Verify after midnight that the script has run as you expect, by looking to see if a new snapshot has been created, using Elastifox or however you administrate your EC2 instances.

Creating The AMI
Creating the AMI to backup the entire database instance is pretty easy. First, you need to upload the PEM files. Remember you are authenticating as the "ubuntu" user:

scp -i id_rsa-gsg-keypair pk-YOUR.pem cert-YOUR.pem

Use your SSH connection into the database server instance to copy the PEM files to the /mnt directory:

sudo cp /home/ubuntu/*.pem /mnt

Now create the bundle. Make sure you use your Amazon account number (without dashes) as the value for the -u parameter. This can take quite a while, so do not get impatient:

sudo ec2-bundle-vol -d /mnt -k /mnt/pk-YOUR.pem -c /mnt/cert-YOUR.pem -r i386 -u YOURUSERACCOUNTNUMBER

You can now upload the bundle to your Amazon S3 account, in preparation for making available as an AMI. Use something versioned for the -b parameter which is the name of the bundle:

sudo ec2-upload-bundle -b my-database-server-1.0-ami -m /mnt/image.manifest.xml -a YOUR_ACCESS_KEY -s YOUR_SECRET_ACCESS_KEY

Final step is going back to your local machine, and making the newly created bundle available to be used to start a new instance:

ec2-register my-database-server-1.0-ami/image.manifest.xml

You can now now launch a brand new database server instance based on this AMI, and it will be a clone of your existing database server. This is the procedure you would follow if you need to restore your database server instance from backups.

Restoring Your Database Server From The Backups
In the case that something goes terribly terribly wrong, you can get back to normal as follows:
- startup a new EBS volume from your most recent snapshot backup,
- start up a new server from your database AMI
- configure your new server instance to use the new volume started from the backup data
- switch your elastic IP to point to the new server, or update the references in your application to point to the new server

This concludes part 2 of the great PostgreSQL config post for the EC2 cloud. Hopefully it will help you with a nice simple way to take the basic PostgreSQL instance that you got up and running on Ubuntu/EC2 using the directions in the part 1 post, and add the confidence that backed up data and a completely reproducible configuration provides.


Shlomo said...

Check out Eric Hammond's article on creating consistent EBS snapshots where multiple volumes are involved. It's written with MySQL in mind, but it can be modified to work with Postgresql.

Ron Evans said...

Thanks for pointing out that article. I depended on several of Eric's other articles when writing my previous posts on PostgreSQL/Ubuntu/EC2, as I mentioned in those posts. His work has been seminal in everyone's use of these tools.

That said, the purpose of my post was to provide the simplest way to achieve a backup of PostgreSQL. ec2-consistent-snapshot looks very interesting for MySQL, but I was not up to the work right at the moment for the modifications needed for PostgreSQL configurations.

Shlomo said...

Overheard on Twitter (whose search feature is currently not working for older tweets, so I am paraphrasing):

"The killer feature of EC2 is @esh"

Ron Evans said...

Oh yes! He has saved my assets, not just in general, but personally.

Freida Machoi said...

Beauty. Nicely done, while Hammond's is great for the deep dive this was perfect for the short and sweet.

Jeremy Cothran said...

Thanks so much for this and the earlier lead article - finally got my existing vmware .vmdk ubuntu image successfully converted over to an ec2 ami after much trial&error and moving my postgresql/postgis database into the cloud(EBS) is the next step. Your documentation was the first I found that really put things together well from the beginner's perspective.

madhav said...

Good article.

Am running into issues with the ec2-bundle-vol command.

Unfortunately getting that command to install itself is a huge task. It needs ec2-ami-tools(didn't realize they weren't part of ec2-api-tools) ruby, openssl, multiverse and what not. After resolving all those dependencies, it gives me this error,

ERROR: You need to be root to run /home/ubuntu/ec2-ami-tools-1.3-34544//lib/ec2/amitools/bundlevol.rb

Now am onto resolving above error. :)

nimeacuerdo said...

Thanks Ron for such a useful article

Unknown said...

I'm not sure taking EBS snapshots are the best mechanism for backing up a production database. From the developer forums "To get a clean snapshot, unmounting the volume (or using dm suspend as described elsewhere) is necessary, as you need to tell the OS to flush all pending writes to the device." This is especially true if the database is hot (or you don't want to incur daily downtime). Also, you are likely backing up a lot of empty disk, as the snapshot is of the whole volume but your data just a portion. A much simpler solution I've used is to pg_dump a tar, compress, and use s3cmd to persist to S3.