Forking Processes






Forking Processes

Forked processes are the traditional way to structure parallel tasks, and they are a fundamental part of the Unix tool set. It's a straightforward way to start an independent program, whether it is different from the calling program or not. Forking is based on the notion of copying programs: when a program calls the fork routine, the operating system makes a new copy of that program in memory and starts running that copy in parallel with the original. Some systems don't really copy the original program (it's an expensive operation), but the new copy works as if it were a literal copy.

After a fork operation, the original copy of the program is called the parent process, and the copy created by os.fork is called the child process. In general, parents can make any number of children, and children can create child processes of their own; all forked processes run independently and in parallel under the operating system's control. It is probably simpler in practice than in theory, though. The Python script in Figure forks new child processes until you type the letter q at the console.

PP3E\System\Processes\fork1.py

# forks child processes until you type 'q'

import os

def child( ):
    print 'Hello from child',  os.getpid( )
    os._exit(0)  # else goes back to parent loop

def parent( ):
    while 1:
        newpid = os.fork( )
        if newpid == 0:
            child( )
        else:
            print 'Hello from parent', os.getpid( ), newpid
        if raw_input( ) == 'q': break

parent( )

Python's process forking tools, available in the os module, are simply thin wrappers over standard forking calls in the C library. To start a new, parallel process, call the os.fork built-in function. Because this function generates a copy of the calling program, it returns a different value in each copy: zero in the child process, and the process ID of the new child in the parent. Programs generally test this result to begin different processing in the child only; this script, for instance, runs the child function in child processes only.[*]

[*] At least in the current Python implementation, calling os.fork in a Python script actually copies the Python interpreter process (if you look at your process list, you'll see two Python entries after a fork). But since the Python interpreter records everything about your running script, it's OK to think of fork as copying your program directly. It really will if Python scripts are ever compiled to binary machine code.

Unfortunately, this won't work on Windows in standard Python today; fork is too much at odds with the Windows model, and a port of this call is still in the works (see also this chapter's sidebar about Cygwin Pythonyou can fork with Python on Windows under Cygwin, but it's not exactly the same). Because forking is ingrained in the Unix programming model, though, this script works well on Unix, Linux, and modern Macs:

[[email protected]]$ python fork1.py
Hello from parent 671 672
Hello from child 672

Hello from parent 671 673
Hello from child 673

Hello from parent 671 674
Hello from child 674
q

These messages represent three forked child processes; the unique identifiers of all the processes involved are fetched and displayed with the os.getpid call. A subtle point: the child process function is also careful to exit explicitly with an os._exit call. We'll discuss this call in more detail later in this chapter, but if it's not made, the child process would live on after the child function returns (remember, it's just a copy of the original process). The net effect is that the child would go back to the loop in parent and start forking children of its own (i.e., the parent would have grandchildren). If you delete the exit call and rerun, you'll likely have to type more than one q to stop, because multiple processes are running in the parent function.

In Figure, each process exits very soon after it starts, so there's little overlap in time. Let's do something slightly more sophisticated to better illustrate multiple forked processes running in parallel. Figure starts up 10 copies of itself, each copy counting up to 10 with a one-second delay between iterations. The time.sleep built-in call simply pauses the calling process for a number of seconds (you can pass a floating-point value to pause for fractions of seconds).

PP3E\System\Processes\fork-count.py

##########################################################################
# fork basics: start 10 copies of this program running in parallel with
# the original; each copy counts up to 10 on the same stdout stream--forks
# copy process memory, including file descriptors; fork doesn't currently
# work on Windows (without Cygwin): use os.spawnv to start programs on
# Windows instead; spawnv is roughly like a fork+exec combination;
##########################################################################

import os, time

def counter(count):
    for i in range(count):
        time.sleep(1)
        print '[%s] => %s' % (os.getpid( ), i)

for i in range(10):
    pid = os.fork( )
    if pid != 0:
        print 'Process %d spawned' % pid
    else:
        counter(10)
        os._exit(0)

print 'Main process exiting.'

When run, this script starts 10 processes immediately and exits. All 10 forked processes check in with their first count display one second later and every second thereafter. Child processes continue to run, even if the parent process that created them terminates:

[email protected]]$ python fork-count.py 
Process 846 spawned
Process 847 spawned
Process 848 spawned
Process 849 spawned
Process 850 spawned
Process 851 spawned
Process 852 spawned
Process 853 spawned
Process 854 spawned
Process 855 spawned
Main process exiting.
[[email protected]]$
[846] => 0
[847] => 0
[848] => 0
[849] => 0
[850] => 0
[851] => 0
[852] => 0
[853] => 0
[854] => 0
[855] => 0
[847] => 1
[846] => 1
 ...more output deleted...

The output of all of these processes shows up on the same screen, because all of them share the standard output stream. Technically, a forked process gets a copy of the original process's global memory, including open file descriptors. Because of that, global objects like files start out with the same values in a child process, so all the processes here are tied to the same single stream. But it's important to remember that global memory is copied, not shared; if a child process changes a global object, it changes only its own copy. (As we'll see, this works differently in threads, the topic of the next section.)

Forking on Windows with Cygwin

Actually, the os.fork call is present in the Cygwin version of Python on Windows. In other words, even though this call is missing in the standard version of Python for Windows, you can now fork processes on Windows with Python if you install and use Cygwin. However, the Cygwin fork call is not as efficient and does not work exactly the same as a fork on true Unix systems.

Cygwin is a freeware package that includes a library that attempts to provide a Unix-like API for use on Windows machines, along with a set of command-line tools that implement a Unix-like environment. It makes it easier to apply Unix skills and code on Windows computers.

According to its current documentation, though, "Cygwin fork( ) essentially works like a non-copy on write version[s] of fork( ) (like old Unix versions used to do). Because of this it can be a little slow. In most cases, you are better off using the spawn family of calls if possible."

In addition to the fork call, Cygwin provides other Unix tools that would otherwise not be available on all flavors of Windows, including os.mkfifo (discussed later in this chapter). It also comes with a gcc compiler environment for building C extensions for Python on Windows that will be familiar to Unix developers. As long as you're willing to use Cygwin libraries to build your application and power your Python, it's very close to Unix on Windows.

Like all third-party libraries, though, Cygwin adds an extra dependency to your systems. Perhaps more critically, Cygwin currently uses the GNU GPL license, which adds distribution requirements beyond those of standard Python. Unlike using Python itself, shipping a program that uses Cygwin libraries may require that your program's source code be made freely available, unless you purchase a special "buy-out" license to free your program of the GPL's requirements. Note that this is a complex legal issue, and you should study Cygwin's license on your own. Its license does, however, impose more constraints than Python's (Python uses a "BSD"-style license, not the GPL).

Still, Cygwin can be a great way to get Unix-like functionality on Windows without installing a completely different operating system such as Linuxa more complete but generally more complex option. For more details, see http://cygwin.com or run a search for Cygwin at Google.com.

See also the standard library's os.spawn family of calls covered later in this chapter for an alternative way to start programs on Unix and Windows that does not require fork and exec calls. To run a simple function call in parallel on Windows (rather than on an external program), also see the section on standard library threads later in this chapter. Both threads and os.spawn calls now work on Windows in standard Python.


The fork/exec Combination

In Examples 5-1 and 5-2, child processes simply ran a function within the Python program and then exited. On Unix-like platforms, forks are often the basis of starting independently running programs that are completely different from the program that performed the fork call. For instance, Figure forks new processes until we type q again, but child processes run a brand-new program instead of calling a function in the same file.

PP3E\System\Processes\fork-exec.py

# starts programs until you type 'q'

import os

parm = 0
while 1:
    parm = parm+1
    pid = os.fork( )
    if pid == 0:                                             # copy process
        os.execlp('python', 'python', 'child.py', str(parm)) # overlay program
        assert False, 'error starting program'               # shouldn't return
    else:
        print 'Child is', pid
        if raw_input( ) == 'q': break

If you've done much Unix development, the fork/exec combination will probably look familiar. The main thing to notice is the os.execlp call in this code. In a nutshell, this call overlays (i.e., replaces) with another process the program that is running in the current process. Because of that, the combination of os.fork and os.execlp means start a new process and run a new program in that processin other words, launch a new program in parallel with the original program.

os.exec call formats

The arguments to os.execlp specify the program to be run by giving command-line arguments used to start the program (i.e., what Python scripts know as sys.argv). If successful, the new program begins running and the call to os.execlp itself never returns (since the original program has been replaced, there's really nothing to return to). If the call does return, an error has occurred, so we code an assert after it that will always raise an exception if reached.

There are a handful of os.exec variants in the Python standard library; some allow us to configure environment variables for the new program, pass command-line arguments in different forms, and so on. All are available on both Unix and Windows, and they replace the calling program (i.e., the Python interpreter). exec comes in eight flavors, which can be a bit confusing unless you generalize:


os.execv( program, commandlinesequence)

The basic "v" exec form is passed an executable program's name, along with a list or tuple of command-line argument strings used to run the executable (that is, the words you would normally type in a shell to start a program).


os.execl( program, cmdarg1, cmdarg2,... cmdargN)

The basic "l" exec form is passed an executable's name, followed by one or more command-line arguments passed as individual function arguments. This is the same as os.execv(program, (cmdarg1, cmdarg2,...)).


os.execlp


os.execvp

Adding the letter p to the execv and execl names means that Python will locate the executable's directory using your system search-path setting (i.e., PATH).


os.execle


os.execve

Adding a letter e to the execv and execl names means an extra, last argument is a dictionary containing shell environment variables to send to the program.


os.execvpe


os.execlpe

Adding the letters p and e to the basic exec names means to use the search path and to accept a shell environment settings dictionary.

So, when the script in Figure calls os.execlp, individually passed parameters specify a command line for the program to be run on, and the word python maps to an executable file according to the underlying system search-path setting environment variable (PATH). It's as if we were running a command of the form python child.py 1 in a shell, but with a different command-line argument on the end each time.

Spawned child program

Just as when typed at a shell, the string of arguments passed to os.execlp by the fork-exec script in Figure starts another Python program file, as shown in Figure.

PP3E\System\Processes\child.py

import os, sys
print 'Hello from child', os.getpid( ), sys.argv[1]

Here is this code in action on Linux. It doesn't look much different from the original fork1.py, but it's really running a new program in each forked process. The more observant readers may notice that the child process ID displayed is the same in the parent program and the launched child.py program; os.execlp simply overlays a program in the same process.

[[email protected]]$ python fork-exec.py
Child is 1094
Hello from child 1094 1

Child is 1095
Hello from child 1095 2

Child is 1096
Hello from child 1096 3
q

There are other ways to start up programs in Python, including the os.system and os.popen we first met in Chapter 3 (to start shell command lines), and the os.spawnv call we'll meet later in this chapter (to start independent programs on Windows and Unix); we will further explore such process-related topics in more detail later in this chapter. We'll also discuss additional process topics in later chapters of this book. For instance, forks are revisited in Chapter 13 to deal with servers and their zombiesi.e., dead processes lurking in system tables after their demise.



 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows