Debugging the Raw HTTP Exchange






Debugging the Raw HTTP Exchange

Problem

You want to analyze the HTTP request a browser makes to your server and the corresponding HTTP response. For example, your server doesn't supply the expected response to a particular request so you want to see exactly what the components of the request are.

Solution

For simple requests, connect to the web server with Telnet and type in the request headers. A sample exchange is shown in Figure.

Sending a request with Telnet

% telnet www.example.com 80
Trying 10.3.75.31...
Connected to www.example.com (10.3.75.31).
Escape character is '^]'.
GET / HTTP/1.0
Host: www.example.com

HTTP/1.1 200 OK
Date: Sun, 03 Dec 2006 02:54:01 GMT
Server: Apache/2.2.2 (Unix)
Last-Modified: Fri, 20 Oct 2006 20:16:24 GMT
ETag: "1348010-2c-4c23b600"
Accept-Ranges: bytes
Content-Length: 44
Connection: close
Content-Type: text/html

[ the body of the response ]

Discussion

When you type in request headers, the web server doesn't know that it's just you typing and not a web browser submitting a request. However, some web servers have timeouts on how long they'll wait for a request, so it can be useful to pretype the request and then just paste it into Telnet. The first line of the request contains the request method (get), a space and the path of the file you want (/), and then a space and the protocol you're using (HTTP/1.0). The next line, the Host header, tells the server which virtual host to use if many are sharing the same IP address. A blank line tells the server that the request is over; it then spits back its response: first headers, then a blank line, and then the body of the response. The Netcat program (http://netcat.sourceforge.net/) is also useful for this sort of task.

Pasting text into Telnet can get tedious, and it's even harder to make requests with the post method that way. If you make a request with HTTP_Request, you can retrieve the response headers and the response body with the getResponseHeader( ) and getResponseBody( ) methods, as shown in Figure.

Getting response headers with HTTP_Request

<?php
require 'HTTP/Request.php';
$r = new HTTP_Request('http://www.example.com/submit.php');
$r->setMethod(HTTP_REQUEST_METHOD_POST);
$r->addPostData('monkey','uncle');
$r->sendRequest();

$response_headers = $r->getResponseHeader();
$response_body    = $r->getResponseBody();
?>

To retrieve a specific response header, pass the header name to getResponseHeader( ). The header name must be all lowercase. Without an argument, getResponseHeader( ) returns an array containing all the response headers. HTTP_Request doesn't save the outgoing request in a variable, but you can reconstruct it by calling the _buildRequest( ) method, as shown in Figure.

Getting request headers with HTTP_Request

<?php
require 'HTTP/Request.php';

$r = new HTTP_Request('http://www.example.com/submit.php');
$r->setMethod(HTTP_REQUEST_METHOD_POST);
$r->addPostData('monkey','uncle');

print $r->_buildRequest();
?>

The request that Figure is something like:

POST /submit.php HTTP/1.1
User-Agent: PEAR HTTP_Request class ( http://pear.php.net/ )
Content-Type: application/x-www-form-urlencoded
Connection: close
Host: www.example.com
Content-Length: 12

monkey=uncle

Accessing response headers with the http stream is possible, but you have to use a function such as fopen( ) that gives you a stream resource. One piece of the metadata you get when passing that stream resource to stream_get_meta_data( ) after the request has been made is the set of response headers. Figure demonstrates how to access response headers with a stream resource.

Getting response headers with the http stream

<?php
$url = 'http://www.example.com/submit.php';
$stream = fopen($url, 'r');
$metadata = stream_get_meta_data($stream);
// The headers are stored in the 'wrapper_data'
foreach ($metadata['wrapper_data'] as $header) {
    print $header . "\n";
}
// The body can be retrieved with
// stream_get_contents()
$response_body = stream_get_contents($stream);
?>

stream_get_meta_data( ) returns an array of information about the stream. The wrapper_data element of that array contains wrapper-specific data. For the http wrapper, that means the response headers, one per subarray element. Figure prints something like:

HTTP/1.1 200 OK
Date: Sun, 07 May 2006 18:24:37 GMT
Server: Apache/2.2.2 (Unix)
Last-Modified: Sun, 07 May 2006 01:58:12 GMT
ETag: "1348011-7-16167500"
Accept-Ranges: bytes
Content-Length: 7
Connection: close
Content-Type: text/plain

The fopen( ) function accepts an optional stream context. Pass it as the fourth argument to fopen( ) if you want to use one. (The second argument is the mode and the third argument is the optional flag indicating whether to use include_path in looking for a file.)

With cURL, include response headers in the output from curl_exec( ) by setting the CURLOPT_HEADER option, as shown in Figure.

Getting response headers with cURL

<?php
$c = curl_init('http://www.example.com/submit.php');
curl_setopt($c, CURLOPT_HEADER, true);
curl_setopt($c, CURLOPT_POST, true);
curl_setopt($c, CURLOPT_POSTFIELDS, 'monkey=uncle&rhino=aunt');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$response_headers_and_page = curl_exec($c);
curl_close($c);
?>

To write the response headers directly to a file, open a filehandle with fopen( ) and set CURLOPT_WRITEHEADER to that filehandle, as shown in Figure.

Writing response headers to a file with cURL

<?php
$fh = fopen('/tmp/curl-response-headers.txt','w') or die($php_errormsg);
$c = curl_init('http://www.example.com/submit.php');
curl_setopt($c, CURLOPT_POST, true);
curl_setopt($c, CURLOPT_POSTFIELDS, 'monkey=uncle&rhino=aunt');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_WRITEHEADER, $fh);
$page = curl_exec($c);
curl_close($c);
fclose($fh) or die($php_errormsg);
?>

cURL's CURLOPT_VERBOSE option causes curl_exec( ) and curl_close( ) to print out debugging information to standard error, including the contents of the request, as shown in Figure.

Verbose output from cURL

$c = curl_init('http://www.example.com/submit.php');
curl_setopt($c, CURLOPT_VERBOSE, true);
curl_setopt($c, CURLOPT_POST, true);
curl_setopt($c, CURLOPT_POSTFIELDS, 'monkey=uncle&rhino=aunt');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$page = curl_exec($c);
curl_close($c);

Figure prints something like:

* Connected to www.example.com (10.1.1.1)
> POST /submit.php HTTP/1.1
Host: www.example.com
Pragma: no-cache
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
Content-Length: 23
Content-Type: application/x-www-form-urlencoded

monkey=uncle&rhino=aunt* Connection #0 left intact
* Closing connection #0

Because cURL prints the debugging information to standard error and not standard output, it can't be captured with output buffering. You can, however, open a filehandle for writing and set CURLOUT_STDERR to that filehandle to divert the debugging information to a file. This is shown in Figure.

Writing cURL verbose output to a file

<?php
$fh = fopen('/tmp/curl.out','w') or die($php_errormsg);
$c = curl_init('http://www.example.com/submit.php');
curl_setopt($c, CURLOPT_VERBOSE, true);
curl_setopt($c, CURLOPT_POST, true);
curl_setopt($c, CURLOPT_POSTFIELDS, 'monkey=uncle&rhino=aunt');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_STDERR, $fh);
$page = curl_exec($c);
curl_close($c);
fclose($fh) or die($php_errormsg);
?>

Another way to access response headers with cURL is to write a "header function." This is similar to a cURL "write function" except it is called to handle response headers instead of the response body. Figure defines a HeaderSaver class whose header( ) method can be used as a header function to accumulate response headers.

Using a cURL header function

<?php

class HeaderSaver {
    public $headers = array();
    public $code = null;

    public function header($curl, $data){
        if (is_null($this->code) &&
            preg_match('@^HTTP/\d\.\d (\d+) @',$data,$matches)) {
            $this->code = $matches[1];
        } else {
            // Remove the trailing newline
            $trimmed = rtrim($data);
            if (strlen($trimmed)) {
                // If this line begins with a space or tab, it's a
                // continuation of the previous header
                if (($trimmed[0] == ' ') || ($trimmed[0] == "\t")) {
                    // Collapse the leading whitespace into one space
                    $trimmed = preg_replace('@^[ \t]+@',' ', $trimmed);
                    $this->headers[count($this->headers)-1] .= $trimmed;
                }
                // Otherwise, it's a new header
                else {
                    $this->headers[] = $trimmed;
                }
            }
        }
        return strlen($data);
    }

}

$h = new HeaderSaver();
$c = curl_init('http://www.example.com/plankton.php');
// Register the header function
curl_setopt($c, CURLOPT_HEADERFUNCTION, array($h,'header'));
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$page = curl_exec($c);
// Now $h is populated with data
print 'The response code was: ' . $h->code . "\n";
print "The response headers were: \n";
foreach ($h->headers as $header) {
    print "  $header\n";
}

The HTTP 1.1 standard specifies that headers can span multiple lines by putting at least one space or tab character at the beginning of the additional lines of the header. The header arrays returned by stream_get_meta_data( ) and HTTP_Request::getResponseHeader( ) do not properly handle multiline headers, though. The additional lines in a header are treated as separate headers. The code in Figure, however, correctly combines the additional lines in multiline headers.

See Also

Documentation on curl_setopt( ) at http://www.php.net/curl-setopt, on stream_get_meta_data( ) at http://www.php.net/stream_get_meta_data, on fopen( ) at http://www.php.net/fopen, and on the PEAR HTTP_Request class at http://pear.php.net/package/HTTP_Request; the syntax of an HTTP request is defined in RFC 2616 and available at http://www.w3.org/Protocols/rfc2616/rfc2616.html. The rules about multiline message headers are in Section 4.2: http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2. The netcat program is available from the GNU Netcat project at http://netcat.sourceforge.net/.



 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows