HTTP Technology Primer





HTTP Technology Primer

HTTP is the basis for Web browsing. HTTP is built upon TCP/IP and is considered an application-level protocol for distributed, collaborative, hypermedia information systems. It is a request/response-oriented protocol where an HTTP client makes a request, and an HTTP server services that request and subsequently responds.

When looking at HTTP from an application-programming point of view, the first thing to understand is that HTTP is connectionless and stateless. HTTP is based upon a Web server (sometimes referred to as HTTPD [HTTP daemon]) receiving a request and then formulating a response to a client. It is connectionless because neither the client nor the server retains any state information regarding the application data. It is up to the application programmer to maintain any state information necessary to the application[1].

[1] We will see later that the servlet API provides ways of maintaining application state information, but this is not part of HTTP.

In most cases, the client is a Web browser but could be an application, a Java applet, or another Web server. While this request/response protocol is not as sophisticated as the newer connection-oriented protocols such as Internet Inter-ORB (Object Request Broker) Protocol (IIOP), it has proved very flexible in allowing a wide variety of vendors to create Web servers, Web browsers, and other HTTP-based systems.

1 Uniform Resource Identifiers

URIs have been called many different names: Universal Resource Identifiers, Universal Resource Locators, WWW addresses, Uniform Resource Locators (URL) and Uniform Resource Names (URN). URLs and URNs are kinds of URIs. URL is specific to the HTTP scheme while URN is not. As far as HTTP is concerned, URIs are simply formatted strings that identify—via name, location, or any other characteristic—a resource. URIs in HTTP can be represented in absolute form or relative to some known base, depending upon the context of their use. The two forms are different in that an absolute URI always begins with a protocol name followed by a colon.

HTTP does not place any limits on the length of a URI. Therefore, HTTP servers should support this requirement. However, programmers formulating a URI ought to be cautious about depending on URI lengths above 255 bytes, because some older client or proxy implementations might not properly support these lengths.

1.1 HTTP URL

Each Web resource (an HTML page, a JSP page, a servlet, etc.) that can be requested from a Web server must have a unique name associated with it. That unique name is called a URL. For discussion purposes, let's consider a URL as a way to uniquely identify a Web page, which exists on a particular Web server. For example, to access the index.html page on the www.abc.com Web server, the absolute and explicit URL would be http://www.abc.com/index.html. The format of a URL is as follows:





protocol://hostname<:port>/identifiers

1.2 What is the difference between a URI and URL?

According to the specification [RFC 2396] all URLs are URIs. However, URIs allow Web services to be defined in a way that they are not bound to a specific server. This has many advantages.

  1. A URL explicitly locates a service to a specific Web server and port. If it becomes necessary to create services and pages that can be hosted on various Web servers, there arises a need to have a way to identify those pages without locating them. The URI provides the unique name for a service, hosted on any Web server.

  2. In addition to being able to relocate a service or set of pages to another Web server, many times it is desirable to replicate these services or pages to several servers to avoid the single-point-of-failure problem. If one of these servers terminates, then other mechanisms such as a network dispatcher can safely send requests to another Web server, which has replicas of those services or pages. The use of a nonlocated URI helps the developer to avoid making code changes to the service or page when deploying to different machines.

An Example of URLs and URIs

A fully specified URL is always of the form:

http://www.mycompany.com/mydirectory/mypage.html

However, a URI may be fragmentary like:

/mydirectory/mypage.html

This difference will become important when we get to the point of specifying URIs that refer to HTML and JSP pages. By only referring to a partial address (e.g., URI) , you keep your HTML tags (and JSP and servlet code) from being tied to a single machine name.


2 Requests, Responses, and Headers

HTTP is a simple protocol based on a client sending a request to a Web server and then getting a response. When the client sends a request, the request contains all of the information that the Web server needs to process the request. Both the request and the response contain a start-line, zero or more header fields (also known as "headers"), an empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields, and possibly a message-body.

2.1 Headers

The headers section of a message contains a general-header section (headers that are applicable to both the request and the response and specific headers), an entity-header section, and either a request-header section or response-header section, depending upon the type of message. The general-header section contains items such as Cache-Control, Date, and Transfer-Encoding. The Transfer-Encoding header can impact the message length as the encoding type may increase the size of the body of the message.

The request-header section contains headers such as Host, Accept-Charset, and Referer. The Referer header specifies the URL of the page from which the request came from while the Host header contains the name of the target host specified in the request (the host which is processing the request).

The response-header section contains headers such as Age, Location, and Server. The Server header specifies the name of the server that generated the response.

Entity headers define metainformation about the entity-body or, if no body is present, about the resource identified by the request. Some entity-headers include Allow, Content-Encoding and Last-Modified.

2.2 Requests

In the case of a request message, the start-line is the request itself. An HTTP request is characterized by a method token, followed by a Request-URI and a protocol version, ending with a CRLF. The method token is one of GET, POST, OPTIONS, HEAD, PUT, DELETE, TRACE, CONNECT, or some extension method as defined by the implementation.

When using HTTP methods to create a request, the application programmer should understand that the writers of the HTTP protocol consider some methods as safe and others as unsafe. This definition of a safe method was noted in the HTTP specification so that user agents can be written to make a user aware of the fact that a possibly unsafe action is being requested. It is thought that the safe methods will not generate side effects as a result of calling them. The protocol does not enforce this idea of safe methods nor can it, as implementers are free to create servers that handle these requests in any way that they see fit. Two key HTTP request methods are particularly important to the programmer (GET and POST). GET is a safe method while POST is unsafe since it is expected that POST will cause side effects by posting some new data:

  • GET— An HTTP GET request is what happens when you type in a URL at a browser. It literally means, "GET a file and return its contents." In the context of a servlet, this means return some dynamic content to the user as HTML.

  • POST— An HTTP POST request is what happens (usually) when you type information into an HTML form and press SUBMIT. It is called post because it was originally intended to represent POSTing a message to an Internet newsgroup.

2.3 Responses

After receiving and interpreting a request, the server must respond. The response message contains a start-line—the status of the request. This status-line contains the HTTP protocol version followed by a numeric status code and its associated textual phrase, with each element separated by spaces. The status code is a 3-digit integer result code of the attempt to understand and satisfy the request. The textual phrase is for debugging purposes.

The first digit in the 3-digit code defines the class of the response. The last two digits do not have any categorization role but instead help to uniquely identify the response. There are 5 values for the first digit:

  • 1xx: Informational— Request received, continuing process

  • 2xx: Success— The action was successfully received, understood, and accepted

  • 3xx: Redirection— Further action must be taken in order to complete the request

  • 4xx: Client Error— The request contains bad syntax or cannot be fulfilled

  • 5xx: Server Error— The server failed to fulfill an apparently valid request

As with any HTTP message, after the start-line (status-line in the response case), the message headers are given, followed by the message-body. The message-body contains the actual data, which will be displayed in the Web browser.

3 Pulling It All Together

When using HTTP, there are numerous scenarios. In an effort to pull together the ideas presented here about URIs and messages over HTTP, we need to take a look at the GET and POST requests and how the interaction between the client and server occurs.

We show a GET request round-trip in Figure. The request is for the URL of http://webserver/index.html. Note that when using a Web browser and HTML to make HTTP requests, a GET request could be made in several ways. Here are the well-known ways:

  1. By typing the URL into the URL line of the Web browser and pressing ENTER.

  2. By clicking on a link, which appears inside an HTML page. The link is coded using the <A HREF="url…"> tag.

  3. By clicking a button on a form, which appears inside of an HTML page. The FORM tag would need to specify a method of GET as in the tag <FORM method="GET" action="url…">.

2. HTTP GET request in action.

graphics/06fig02.gif

The Web server, upon receiving this request, maps the request to a file located on the Web server file system (e.g., C:\www\html\index.html) and then responds with the contents of that file to the browser. The entire transaction involves one connection to the Web server and an almost immediate response.

In Figure, we show a POST request round-trip. This request is for the URL http://webserver/servlet/Register. Note that when using a Web browser and HTML to make HTTP requests, a POST request can only be made by clicking a button on a FORM which appears inside of an HTML page. The FORM would need to specify a method of POST as in the tag <FORM method="POST" action= "url…">.

3. HTTP POST request in action.

graphics/06fig03.gif

The Web server, upon receiving this request, transfers it to the servlet engine, which loads the requested servlet by searching the classpath and then runs the servlet. Next, the Web server (or Web container) reads the posted data and performs the requested operation. Lastly, the Web server (or servlet engine) responds with a message that is displayed in the browser. The entire transaction involves at least one connection to the Web server with an almost immediate response.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows