March 27, 2011, 1:37 p.m.
posted by ifman
Automate Data Dumps for PostgreSQL Databases
Building your own backup utility doesn't have to be scary.
PostgreSQL is a robust, open source database server. Like most database servers, it provides utilities for creating backups. PostgreSQL's primary tools for creating backup files are pg_dump and pg_dumpall. However, if you want to automate your database backup processes, these tools have a few limitations:
pg_dump dumps only one database at a time.
pg_dumpall dumps all of the databases into a single file.
pg_dump and pg_dumpall know nothing about multiple backups.
These aren't criticisms of the backup tools—just an observation that customization will require a little scripting. Our resulting script will backup multiple systems, each to their own backup file.
1 Creating the Script
This script uses Python and its ability to execute other programs to implement the following backup algorithm:
Change the working directory to a specified database backup directory.
Rename all backup files ending in .gz so that they end in .gz.old. Existing files ending in .gz.old will be overwritten.
Clean up and analyze all PostgreSQL databases using its vacuumdb command.
Get a current list of databases from the PostgreSQL server.
Why Python? My choice is one of personal preference; this task is achievable in just about any scripting language. However, Python is cross-platform and easy to learn, and its scripts are easy to read.
2 The Code
#!/usr/local/bin/python # /usr/local/bin/pg2gz.py # This script lists all PostgreSQL # databases and pipes them separately # through gzip into .gz files. # INSTRUCTIONS # 1. Review and edit line 1 to reflect the location # of your python command file. # 2. Redefine the save_dir variable (on line 22) to # your backup directory. # 3. To automate the backup process fully, consider # scheduling the regular execution of this script # using cron. import os, string # Redefine this variable to your backup directory. # Be sure to include the slash at the end. save_dir = '/mnt/backup/databases/' # Rename all *.gz backup files to *.gz.old. curr_files = os.listdir(save_dir) for n in curr_files: if n[len(n)-2:] = = 'gz': os.popen('mv ' + save_dir + n + " " + save_dir + n + '.old') else: pass # Vacuum all databases os.popen('vacuumdb -a -f -z') # 'psql -l' produces a list of PostgreSQL databases. get_list = os.popen('psql -l').readlines( ) # Exclude header and footer lines. db_list = get_list[3:-2] # Extract database names from first element of each row. for n in db_list: n_row = string.split(n) n_db = n_row # Pipe database dump through gzip # into .gz files for all databases # except template*. if n_db = = 'template0': pass elif n_db = = 'template1': pass else: os.popen('pg_dump ' + n_db + ' | gzip -c > ' + save_dir + n_db + '.gz')
3 Running the Hack
The script assumes that you have a working installation of PostgreSQL. You'll also need to install Python, which is available through the ports collection or as a binary package. The Python modules used are installed by default.
Double-check the location of your Python executable using:
% which python /usr/local/bin/python
and ensure the first line of the script reflects your location. Don't forget to make the script executable using chmod +x.
On line 22 of the script, redefine the sav_dir variable to reflect the location of your backup directory. As is, the script assumes a backup directory of /mnt/backup/databases/.
You'll probably want to add the script to the pgsql user's crontab for periodic execution. To schedule the script for execution, log in as pgsql or, as the superuser, su to pgsql. Once you're acting as pgsql, execute:
% crontab -e
to open the crontab file in the default editor.
Given the following crontab file, /usr/local/bin/pg2gz.py will execute at 4 AM every Sunday.
# more /var/cron/tabs/pgsql SHELL=/bin/sh PATH=/var/cron/tabs:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin #minute hour mday month wday command 0 4 * * 0 /usr/local/bin/pg2gz.py