July 21, 2011, 7:40 p.m.
posted by bruce
Packing and Unpacking Files
Many moons ago (about 10 years), I used machines that had no tools for bundling files into a single package for easy transport. Here is the situation: you have a large set of text files laying around that you need to transfer to another computer. These days, tools like tar are widely available for packaging many files into a single file that can be copied, uploaded, mailed, or otherwise transferred in a single step. As mentioned in an earlier footnote, even Python itself has grown to support zip and tar archives in the standard library (see the zipfile and tarfile modules in the library reference).
Before I managed to install such tools on my PC, though, portable Python scripts served just as well. Figure copies all of the files listed on the command line to the standard output stream, separated by marker lines.
The first line in this file is a Python comment (#...), but it also gives the path to the Python interpreter using the Unix executable-script trick discussed in Chapter 3. If we give textpack.py executable permission with a Unix chmod command, we can pack files by running this program file directly from a shell console and redirect its standard output stream to the file in which we want the packed archive to show up:
C:\...\PP3E\System\App\Clients\test>type spam.txt SPAM spam C:\......\test>python ..\textpack.py spam.txt eggs.txt ham.txt > packed.all C:\......\test>type packed.all ::::::::::textpak=>spam.txt SPAM spam ::::::::::textpak=>eggs.txt EGGS ::::::::::textpak=>ham.txt ham
Running the program this way creates a single output file called packed.all, which contains all three input files, with a header line giving the original file's name before each file's contents. Combining many files into one file in this way makes it easy to transfer in a single steponly one file need be copied to floppy, emailed, and so on. If you have hundreds of files to move, this can be a big win.
After such a file is transferred, though, it must somehow be unpacked on the receiving end to re-create the original files. To do so, we need to scan the combined file line by line, watching for header lines left by the packer to know when a new file's contents begin. Another simple Python script, shown in Figure, does the trick.
We could code this in a function like we did in textpack, but there is little point in doing so here; as written, the script relies on standard streams, not function parameters. Run this in the directory where you want unpacked files to appear, with the packed archive file piped in on the command line as the script's standard input stream:
C:\......\test\unpack>python ..\..\textunpack.py < ..\packed.all C:\......\test\unpack>ls eggs.txt ham.txt spam.txt C:\......\test\unpack>type spam.txt SPAM Spam
Packing Files "++"
So far so good; the textpack and textunpack scripts made it easy to move lots of files around without lots of manual intervention. They are prime examples of what are often called tactical scriptsprograms you code quickly for a specific task.
But after playing with these and similar scripts for a while, I began to see commonalities that almost cried out for reuse. For instance, almost every shell tool I wrote had to scan command-line arguments, redirect streams to a variety of sources, and so on. Further, almost every command-line utility wound up with a different command-line option pattern, because each was written from scratch.
The following few classes are one solution to such problems. They define a class hierarchy that is designed for reuse of common shell tool code. Moreover, because of the reuse going on, every program that ties into its hierarchy sports a common look-and-feel in terms of command-line options, environment variable use, and more. As usual with object-oriented systems, once you learn which methods to overload, such a class framework provides a lot of work and consistency for free.
And once you start thinking in such ways, you make the leap to more strategic development modes, writing code with broader applicability and reuse in mind. The module in Figure, for instance, adapts the textpack script's logic for integration into this hierarchy.
Here, PackApp inherits members and methods that handle:
from the StreamApp class, imported from another Python module file (listed in Figure). StreamApp provides a "read/write" interface to redirected streams and a standard "start/run/stop" script execution protocol. PackApp simply redefines the start and run methods for its own purposes and reads and writes itself to access its standard streams. Most low-level system interfaces are hidden by the StreamApp class; in OOP terms, we say they are encapsulated.
This module can both be run as a program and imported by a client (remember, Python sets a module's name to _ _main_ _ when it's run directly, so it can tell the difference). When run as a program, the last line creates an instance of the PackApp class and starts it by calling its main methoda method call exported by StreamApp to kick off a program run:
C:\......\test>python ..\packapp.py -v -o packedapp.all spam.txt eggs.txt ham.txt PackApp start. packing: spam.txt packing: eggs.txt packing: ham.txt PackApp done. C:\......\test>type packedapp.all ::::::::::textpak=>spam.txt SPAM spam ::::::::::textpak=>eggs.txt EGGS ::::::::::textpak=>ham.txt ham
This has the same effect as the textpack.py script, but command-line options (-v for verbose mode, -o to name an output file) are inherited from the StreamApp superclass. The unpacker in Figure looks similar when migrated to the object-oriented framework, because the very notion of running a program has been given a standard structure.
This subclass redefines the start and run methods to do the right thing for this script: prepare for and execute a file unpacking operation. All the details of parsing command-line arguments and redirecting standard streams are handled in superclasses:
C:\......\test\unpackapp>python ..\..\unpackapp.py -v -i ..\packedapp.all UnpackApp start. creating: spam.txt creating: eggs.txt creating: ham.txt UnpackApp done. C:\......\test\unpackapp>ls eggs.txt ham.txt spam.txt C:\......\test\unpackapp>type spam.txt SPAM spam
Running this script does the same job as the original textunpack.py, but we get command-line flags for free (-i specifies the input files). In fact, there are more ways to launch classes in this hierarchy than I have space to show here. A command-line pair, -i -, for instance, makes the script read its input from stdin, as though it were simply piped or redirected in the shell:
C:\......\test\unpackapp>type ..\packedapp.all | python ..\..\unpackapp.py -i - creating: spam.txt creating: eggs.txt creating: ham.txt
Application Hierarchy Superclasses
This section lists the source code of StreamApp and App the classes that do all of this extra work on behalf of PackApp and UnpackApp. We don't have space to go through all of this code in detail, so be sure to study these listings on your own for more information. It's all straight Python code.
I should also point out that the classes listed in this section are just the ones used by the object-oriented mutations of the textpack and textunpack scripts. They represent just one branch of an overall application framework class tree, which you can study on this book's examples distribution (browse its directory, PP3E\System\App). Other classes in the tree provide command menus, internal string-based file streams, and so on. You'll also find additional clients of the hierarchy that do things like launch other shell tools and scan Unix-style email mailbox files.
StreamApp: adding stream redirection
StreamApp adds a few command-line arguments (-i, -o) and input/output stream redirection to the more general App root class listed later in this section; App, in turn, defines the most general kinds of program behavior, to be inherited in Examples 6-8, 6-9, and 6-10i.e., in all classes derived from App.
App: the root class
The top of the hierarchy knows what it means to be a shell application, but not how to accomplish a particular utility task (those parts are filled in by subclasses). App, listed in Figure, exports commonly used tools in a standard and simplified interface and a customizable start/run/stop method protocol that abstracts script execution. It also turns application objects into file-like objects: when an application reads itself, for instance, it really reads whatever source its standard input stream has been assigned to by other superclasses in the tree (such as StreamApp).
Why use classes here?
Now that I've listed all this code, some readers might naturally want to ask, "So why go to all this trouble?" Given the amount of extra code in the object-oriented version of these scripts, it's a perfectly valid question. Most of the code listed in Figure is general-purpose logic, designed to be used by many applications. Still, that doesn't explain why the packapp and unpackapp object-oriented scripts are larger than the original equivalent textpack and textunpack non-object-oriented scripts.
The answers will become more apparent after the first few times you don't have to write code to achieve a goal, but there are some concrete benefits worth summarizing here:
Although it's not obvious until you start writing larger class-based systems, code reuse is perhaps the biggest win for class-based programs. For instance, in Chapter 11, we will reuse the object-oriented-based packer and unpacker scripts by invoking them from a menu GUI like so:
from PP3E.System.App.Clients.packapp import PackApp ...get dialog inputs, glob filename patterns app = PackApp(ofile=output) # run with redirected output app.args = filenames # reset cmdline args list app.main( ) from PP3E.System.App.Clients.unpackapp import UnpackApp ...get dialog input app = UnpackApp(ifile=input) # run with input from file app.main( ) # execute app class
Because these classes encapsulate the notion of streams, they can be imported and called, not just run as top-level scripts. Further, their code is reusable in two ways: not only do they export common system interfaces for reuse in subclasses, but they can also be used as software components, as in the previous code listing. See the PP3E\Gui\Shellgui directory for the full source code of these clients.
Python doesn't impose object-oriented programming, of course, and you can get a lot of work done with simpler functions and scripts. But once you learn how to structure class trees for reuse, going the extra object-oriented mile usually pays off in the long run.