Handling File Uploads via CGI

Handling File Uploads via CGI

Credit: Mauro Cicio


You want to let a visitor to your web site upload a file to the web server, either for storage or processing.


The CGI class provides a simple interface for accessing data sent through HTTP file upload. You can access an uploaded file through CGI#params as though it were any other CGI form variable.

If the uploaded file size is smaller than 10 kilobytes, its contents are made available as a StringIO object. Otherwise, the file is put into a Tempfile on disk: you can read the file from disk and process it, or move it to a permanent location.

Here's a CGI that accepts file uploads and saves the files to a special directory on disk:

	# upload.rb

	# Save uploaded files to this directory
	UPLOAD_DIR = "/usr/local/www/uploads"

	require 'cgi'
	require 'stringio'

The CGI has two main parts: a method that prints a file upload form and a method that processes the results of the form. The method that prints the form is very simple:

	def display_form(cgi)
	 action = env['script_name']
	 return <<EOF
	<form action="#{action}" method="post" enctype="multipart/form-data">
	 File to Upload: <input type="file" name="file_name"><br>
	 Your email address: <input type="text" name="email_address"
	                      value="[email protected]"><br>
	 <input type="submit" name="Submit" value="Submit Form">

The method that processes the form is a little more complex:

	def process_form(cgi)
	  email = cgi.params['email_address'][0]
fileObj = cgi.params['file_name'][0]

	  str = '<h1>Upload report</h1>' +
	    "<p>Thanks for your upload, #{email.read}</p>"
	  if fileObj
	    path = fileObj.original_filename
	    str += "Original Filename : #{path}" + cgi.br
	    dest = File.join(UPLOAD_DIR, sanitize_filename(path))

	    str += "Destination : #{dest} <br>"
	    File.open(dest.untaint, 'wb') { |f| f << fileObj.read }

	    # Delete the temporary file if one was created
	    local_temp_file = fileObj.local_path()
	    File.unlink(local_temp_file) if local_temp_file
	  return str

The process_form method calls a method sanitize_filename to pick a new filename based on the original. The new filename is stripped of characters in the upload file's name that aren't valid on the server's filesystem. This is important for security reasons. It's also important to pick a new name because Internet Explorer on Windows submits filenames like "c:\hot\fondue.txt" where other browsers would submit "fondue.txt". We'll define that method now:

	def sanitize_filename(path)
	  if RUBY_PLATFORM =~ %r{unix|linux|solaris|freebsd}
	    # Not required for unix platforms since all characters
	    # are allowed (except for /, which is stripped out below).
	  elsif RUBY_PLATFORM =~ %r{win32}
	    # Replace illegal characters for NTFS with _
	    # Assume a very restrictive OS such as MSDOS
	    path.gsub!(/[\/|\?*+\]\[ \x00-\x1fa-z]/,'_')

	  # For files uploaded by Windows users, strip off the beginning path.
	  return path.gsub(/^.*[\\\/]/, '')

Finally we have the CGI code itself, which calls the appropriate method and prints out the results in an HTML page:

	cgi = CGI.new('html3')
	if cgi.request_method !~ %r{POST}
	  buf = display_form(cgi)
	  buf = process_form(cgi)
	cgi.out() do
	  cgi.html() do
	    cgi.head{ cgi.title{'Upload Form'} } + cgi.body() { buf }

	exit 0


This CGI script presents the user with a form that lets them choose a file from their local system to upload. When the form is POSTed, CGI accepts the uploaded file data and stores it as a CGI parameters. As with any other CGI parameter (like email_address), the uploaded file is keyed off of the name of the HTML form element: in this case, file_name.

If the file is larger than 10 kilobytes, it will be written to a temporary file and the contents of CGI[:file_name] will be a Tempfile object. If the file is small, it will be kept directly in memory as a StringIO object. Either way, the object will have a few methods not found in normal Tempfile or StringIO objects. The most useful of these are original_filename, content_type, and read.

The original_filename method returns the name of the file, as seen on the computer of the user who uploaded it. The content_type method returns the MIME type of the uploaded file, again as estimated by the computer that did the upload. You can use this to restrict the types of file you'll accept as uploads (note, however, that a custom client can lie about the content type):

	# Limit uploads to BMP files.
	raise 'Wrong type!' unless fileObj.content_type =~ %r{image/bmp}

Every StringIO object supports a read method that simply returns the contents of the underlying string. For the sake of a uniform interface, a Tempfile object created by file upload also has a read method that returns the contents of a file. For most applications, you don't need to check whether you've got a StringIO or a Tempfile: you can just call read and get the data. However, a Tempfile can be quite largethere's a reason it was written to disk in the first placeso don't do this unless you trust your users or have a lot of memory. Otherwise, check the size of a Tempfile with File.size and read it a block at a time.

To see where a Tempfile is located on disk, call its local_path method. If you plan to write the uploaded file to disk, it's more efficient to move a Tempfile with FileUtils.mv than to read it into memory and immediately write it back to another location.

Temporary files are deleted when the Ruby interpreter exits, but some web frameworks keep a single Ruby interpreter around indefinitely. If you're not careful, a long-running application can fill up your disk or partition with old temporary files. Within a CGI script, you should explicitly delete temporary files when you're done with themexcept, of course, the ones you move to permanent positions elsewhere on the filesystem.

See Also

 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows