Hack 40 Automate Data Dumps for PostgreSQL Databases 
Building your own backup utility doesn't have to be scary.
PostgreSQL
is
a robust, open source database server. Like most database servers, it
provides utilities for creating backups.
PostgreSQL's primary tools for creating backup files
are
pg_dump and
pg_dumpall. However, if you want to automate your
database backup processes, these tools have a few limitations:
pg_dump dumps only one database at a time. pg_dumpall dumps all of the databases into a
single file. pg_dump and pg_dumpall know
nothing about multiple backups.
These aren't criticisms of the backup
tools—just an observation that customization will require a
little scripting. Our resulting script will backup multiple systems,
each to their own backup file.
4.7.1 Creating the Script
This script uses Python and its ability to execute other programs to
implement the following backup algorithm:
Change the working directory to a specified database backup directory. Rename all backup files ending in .gz so that
they end in .gz.old. Existing files ending in
.gz.old will be overwritten. Clean up and analyze all PostgreSQL databases using its
vacuumdb command. Get a current list of databases from the PostgreSQL server. Dump each database, piping the results
through gzip, into
its own compressed file.
Why Python? My choice is one of personal preference; this task is
achievable in just about any scripting language. However, Python is
cross-platform and easy to learn, and its scripts are easy to read.
4.7.2 The Code
#!/usr/local/bin/python
# /usr/local/bin/pg2gz.py
# This script lists all PostgreSQL
# databases and pipes them separately
# through gzip into .gz files.
# INSTRUCTIONS
# 1. Review and edit line 1 to reflect the location
# of your python command file.
# 2. Redefine the save_dir variable (on line 22) to
# your backup directory.
# 3. To automate the backup process fully, consider
# scheduling the regular execution of this script
# using cron.
import os, string
# Redefine this variable to your backup directory.
# Be sure to include the slash at the end.
save_dir = '/mnt/backup/databases/'
# Rename all *.gz backup files to *.gz.old.
curr_files = os.listdir(save_dir)
for n in curr_files:
if n[len(n)-2:] = = 'gz':
os.popen('mv ' + save_dir + n + " " + save_dir + n + '.old')
else:
pass
# Vacuum all databases
os.popen('vacuumdb -a -f -z')
# 'psql -l' produces a list of PostgreSQL databases.
get_list = os.popen('psql -l').readlines( )
# Exclude header and footer lines.
db_list = get_list[3:-2]
# Extract database names from first element of each row.
for n in db_list:
n_row = string.split(n)
n_db = n_row[0]
# Pipe database dump through gzip
# into .gz files for all databases
# except template*.
if n_db = = 'template0':
pass
elif n_db = = 'template1':
pass
else:
os.popen('pg_dump ' + n_db + ' | gzip -c > ' + save_dir +
n_db + '.gz')
4.7.3 Running the Hack
The script assumes that you have a working installation of
PostgreSQL. You'll also need to install Python,
which is available through the ports collection or as a binary
package. The Python modules used are installed by default.
Double-check the location of your Python executable using:
% which python
/usr/local/bin/python
and ensure the first line of the script reflects your location.
Don't forget to make the script executable using
chmod +x.
On line 22 of the script, redefine the sav_dir
variable to reflect the location of your backup directory. As is, the
script assumes a backup directory of
/mnt/backup/databases/.
You'll probably want to add the script to the
pgsql user's crontab for periodic
execution. To schedule the script for execution, log in as
pgsql or, as the superuser, su
to pgsql. Once you're acting as
pgsql, execute:
% crontab -e
to open the crontab file in the default editor.
Given the following crontab file,
/usr/local/bin/pg2gz.py will execute at 4 AM
every Sunday.
# more /var/cron/tabs/pgsql
SHELL=/bin/sh
PATH=/var/cron/tabs:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin
#minute hour mday month wday command
0 4 * * 0 /usr/local/bin/pg2gz.py
4.7.4 See Also
|