March 28, 2011, 12:41 p.m.
posted by superj
Java and Firewalls
This section considers how Java security can be affected when firewall systems are used on the network. Various firewall implementations can affect the proper working of a network connection through a firewall.
A firewall is any computer system, network hardware, or system/hardware combination that links two or more networks and enforces an access control policy between them. The intent is that one side of the network, the secure network, is protected from malicious entities in the other part of the network, the nonsecure network, as shown in Figure. This concept is analogous to a building's solid firewalls, which prevent a fire from spreading from one part of the building to another.
Sometimes, a single hardware system is called a firewall; other times, a complex collection of multiple routers and servers implement the firewall function. The National Computer Security Association (NCSA) has created tests to enforce minimum standards for a firewall, but that has not stopped some vendors from using the term creatively. Here, we are concerned only about the policies enforced by the firewall and what the effect is on the data traffic.
Depending on their configuration, firewalls can affect any type of network traffic. Two such types of traffic are
Firewalls may be present in the client network, the server network, or both.
Current literature on firewalls is filled with buzzwords describing the various software techniques used to create firewalls. Techniques include packet filtering, application gateways, proxy servers, dynamic filters, bastion hosts, DMZs, and dual-homed gateways. For the purpose of this book, we concentrate on the data packets flowing through the firewall. The basic security functions of any firewall are to examine data packets sent through the firewall and to accept, reject, or modify the packets according to the security policy requirements. Most of today's firewalls work only with TCP/IP data.
1 TCP/IP Packets
All network traffic exchange is performed by sending blocks of data between two connected systems. The blocks of data are encapsulated within a data packet by adding header fields to control what happens to the data block en route and when it reaches its final destination. Network architectures are constructed of layers of function, each built on the services of the layer beneath it. The most thorough layered architecture is the Open Systems Interconnection (OSI) model, whereas other architectures, such as TCP/IP, use broader layer definitions. On the wire, these layers are translated into a series of headers placed before the data being sent, as shown in Figure.
The first part of the header, the data link/physical header, is determined by the type of network. Ethernet, token-ring, serial lines, Fiber Distributed Data Interface (FDDI) networks: Each type has its own headers, containing synchronization, start-of-packet identifiers, access control, and physical addresses as required by the network type. There may be fields to distinguish IP packets from other types of packets, such as NetBIOS or SNA. We consider only IP packets here.
The next part of the header of IP packets is the standard IP header, which specifies the originator, or source address, and the intended recipient, or destination address, together with fields to control how the packet is forwarded through the Internet. IP headers adhere to the IP standard.
Next is the transport layer header, which controls what happens to the packet when it reaches its destination. Almost all the user-level protocols commonly referred to as TCP/IP use either a TCP or a User Datagram Protocol (UDP) header at the transport layer. Finally, application protocol headers and data are contained in the payload portion of the packet and are passed from the sending process to the receiving process.
Each packet header contains a number of data fields, which may be examined by a firewall and used to decide whether to accept or reject the data packet. The most important data fields are
The source and destination IP addresses logically identify the machines at each end of the connection and are used by intermediate machines to route the packet through the network. An IP address identifies a physical or logical network interface on the machine, which allows a single machine to have several IP addresses.
The TCP/IP networking software uses the source and destination port numbers at each end to send the packets to the appropriate program running on the machines. Standard port numbers are defined for the common network services; for example, by default, a File Transfer Protocol (FTP) server expects to receive TCP requests addressed to port 21; an HTTP Web server, to port 80.
However, nonstandard ports may be used. It is quite possible to put a Web server on port 21 and access it with a URL of http://serverName:21/. Because of this possibility, some firewall systems examine the inside details of the protocol data, not just headers, to ensure that only valid data can flow through.
As an elementary security precaution, port numbers less than 1024 are privileged ports. On some systems, such as UNIX, programs without the appropriate privileges are prevented from listening to these ports. On less secure operating systems, a program can listen on any port. HTTP Web servers, in particular, are often run on nonstandard ports, such as 8000 or 8080, to avoid requiring the privileged standard port 80.
The nonprivileged ports of 1024 and above can be used by any program; when a connection is created, a free port number will be allocated to the program. For example, a Web browser opening a connection to a Web server might be allocated port 1044 to communicate with server port 80. But what happens if a Web browser from another client also gets allocated port 1044? The two connections are distinguished by looking at all four values—source IP address, source port, destination IP address, destination port—as this group of values is guaranteed to be unique by the TCP standards.
2 Program Communication through a Firewall
Simple packet-filtering firewalls use the source and destination IP addresses and ports to determine whether packets may pass through the firewall. The firewall may permit packets going to a Web server on destination port 80 and the replies on source port 80 but reject packets to other port numbers. This restriction may be allowed in one direction only and may be further restricted by allowing only packets to and from a particular group of Web servers, as shown in Figure.
Data may need to pass through more than one firewall. Users in a corporate network often have a firewall between them and the Internet in order to protect the entire corporate network. And at the other end of the connection, the remote server often has a firewall to protect it and its networks.
These firewalls may enforce different rules on what types of data are allowed to flow through. This difference can have consequences for Java programs and programs written in any other language. It is not uncommon to find Java-enabled Web pages that work over a home Internet connection simply fail to run on a corporate network.
Two problem areas, discussed in Section 2.4.3 on page 45, are as follows.
Proxy servers and SOCKS gateways are two common approaches used to provide Internet access through corporate firewalls. The primary goal is to allow people within the company network the ability to access the World Wide Web (WWW) but to prevent people outside from accessing the company's internal networks.
2.1 Proxy Servers
A proxy server receives a request from a Web browser, performs that request on behalf of the browser—possibly after authorization checks—and returns the results to the browser (see Figure). Instead of sending a specific request to a particular server, a browser sends the request to a proxy system. Then, the proxy system contacts the server. This indirect approach has several advantages.
The disadvantages are that browser configuration is more complex, the added data transfers can add an extra delay to page access, and proxies sometimes impose additional restrictions, such as a timeout on the length of a connection, preventing very large downloads.
The other common approach to providing Internet access through a firewall is to use a SOCKS gateway. The SOCKS protocol allows users within a corporate firewall to access almost any TCP or UDP service outside the firewall but without allowing outsiders to get back inside. This approach works through TCP and SOCKS, together with a SOCKS server program running in the firewall system (see Figure).
SOCKS is a means of encapsulating any TCP within the SOCKS protocol. On the client system, within the corporate network, the data packets to be sent to an external system will be put inside a SOCKS packet and sent to a SOCKS server. For example, a request for http://www.fabrikam456.com/page.html would, if sent directly, be contained in a packet with the characteristics shown in Listing 2.1.
Destination address: www.fabrikam456.com Destination port: 80 (HTTP) Data: "GET /page.html"
If SOCKS were used, the packet sent would be, effectively, as in Listing 2.2.
Destination address: socks_server.local456.com Destination port: TCP 1080 (SOCKS) Data: Destination address = www.fabrikam456.com Destination port = TCP 80 (HTTP) Data = "GET /page.html"
When it receives this packet, the SOCKS server extracts the required destination address, port, and data and sends this packet; naturally, the source IP address will be that of the SOCKS server itself. The firewall will have been configured to allow these packets from the SOCKS server program, so they will not be blocked. Returning packets will be sent to the SOCKS server, which will encapsulate them similarly and pass on to the original client, which in turn strips off the SOCKS encapsulation, giving the required data to the application. The advantage of all this is that the firewall can be configured very simply to allow any TCP/IP connection on any port, from the SOCKS server to the nonsecure Internet, trusting it to disallow any connections that are initiated from the Internet (see Figure).
The disadvantage is that the client software must be modified to use SOCKS. The original approach was to recompile the network client code with a new SOCKS header file, which translated TCP system calls—connect, getsockname, bind, accept, listen, select—into new names: Rconnect, Rgetsockname, Rbind, Raccept, Rlisten, Rselect. When linked with the libsocks library, these new names will access the SOCKS version rather than the standard system version, resulting in a new, SOCKSified version of the client software.
This approach is still used for clients running on UNIX. On Windows operating systems, the dynamically linked libraries that implement the TCP calls can be replaced by a SOCKSified version, usually termed a SOCKSified TCP/IP stack. This SOCKSified stack can then be used with any client code, without the need to modify the client, requiring only the SOCKS configuration to be specified, that is, the address of the SOCKS server and information on whether to use SOCKS protocol or to make a direct connection.
2.3 Proxy Servers versus SOCKS Gateways
The three options of providing secure Internet access through corporate firewalls are to use
Each of these options has its own advantages and disadvantages for the company network security manager to evaluate for the company's particular environment. But what does the end user need to do to use these options?
Rather than sending all requests to the SOCKS server, which may overload it, as well as supporting other clients, the SOCKSified stack provides better support for deciding whether to use SOCKS. The stack is controlled by a configuration file that specifies which addresses are internal and can be handled directly and which addresses must go through the SOCKS server. Of course, if a SOCKSified stack is used, SOCKS should not be enabled in the client application configuration. However, as a SOCKSified stack is not available for all platforms, the client application's SOCKS configuration may have to be used.
The SOCKSified stack approach will also work with Java programs, as the classes in the java.net package will use the underlying TCP stack. Therefore, this approach provides a simple way of running Java programs using a SOCKS server through a firewall. But if a SOCKSified stack is not available, you will need to SOCKSify the library classes yourself if you have source code or look for a vendor that supports SOCKS.
3 The Effect of Firewalls on Java Programs
The effect of firewalls on Java programs can be considered from three points of view: loading them, stopping them, and the network connections that the programs themselves may create.
3.1 Using HTTP for Applet Downloading
Java applets within a Web page are transferred using HTTP when the browser fetches the class files referred to by the <APPLET> tag. So, if a Web page contains a tag like the one in Listing 2.3, the browser would transfer the Web page it- self first, then the file Example.class, and then any class files referred to in Example.class.
<APPLET Code="Example.class" Width=300 Height=300> <PARAM NAME=pname VALUE="example1"> </APPLET>
Each HTTP transfer would be performed separately, unless HTTP V1.1 is used. Starting with Java Development Kit (JDK) V1.1, the Java language allows a more efficient transfer, whereby all the classes are combined into a compressed Java Archive (JAR) file. In this case, the Web page contains a tag like the one in Listing 2.4.
<APPLET Archive="example.jar" Code="Example.class" Width=300 Height=300> </APPLET>
If there are problems finding example.jar or if an older browser that still runs a JDK V1.0 JVM is used, the archive option is ignored, and the code option is used instead, as in the previous example.
3.2 Using a Firewall to Stop Java Downloads
What effect do firewalls have on the downloading of Java class files? If the security policy is to allow HTTP traffic to flow through the firewall, Java programs and JAR files will simply be treated like any other component of a Web page and be transferred. But on the other hand, if HTTP is prohibited, it is going to be very difficult to obtain the applet class files, unless there is another way of getting them, such as using FTP. Quite frequently, Web servers using nonstandard TCP ports, such as 81, 8000, and 8080, may be blocked by the firewall, so if you are running a Web server, stick to the standard port 80 if you want as many people as possible to see your Web pages and applets.
Because Java programs are transferred using HTTP, the IP and TCP headers are indistinguishable from any other element of a Web page. Simple packet filtering based on IP addresses and port numbers will therefore not be able to block only Java programs. If more selective filtering is needed, an additional step beyond basic packet filtering will be required: examining the packet payload, or the HTTP data itself. This can be done with a suitable Web proxy server or an HTTP gateway that scans the data transferred.
If a Web proxy server is used, a common arrangement is to force all clients to go through the proxy server—inside the firewall—by preventing all HTTP access through the firewall, unless that access came from the proxy server itself (Figure). A user who does not have an arrangement like this can bypass the checking by connecting directly.
How can a Java class file inside the HTTP packet be identified? In an ideal world, there would be a standard MIME data type for Java classes. Thus, a Web browser might request:
Accept: application/java, application/jar
Firewalls could quite easily check for these requests and the Web server Content-Type: replies. In practice, however, servers respond with a variety of MIME types, such as:
This means that it is necessary to examine the data being transferred to see whether it might be Java bytecode or JAR files. Bytecode files must start with hexadecimal number 0XCAFEBABE in the first four bytes (see Figure on page 236). This string, called the magic number, will also be found in bytecode files that are embedded in JAR files, but as a JAR file may be compressed, a scanner must work harder to find the signature. Commercial products are available that can perform this inspection. They usually work as, or with, an HTTP proxy server and check all HTTP requests passing through.
Searching for the class file signature is an effective way to stop Java classes, but it indiscriminately prohibits good code and bad. The cleanest solution to the problem of selectively stopping Java code is the use of signed code. By certifying the originator of the code, one can permit Java bytecode from sites where the signer is trusted, such as your own company sites, and disallow other sites.
Another question may be whether to allow Java or any other type of executable content to travel through the firewall. A site with public Web servers would probably allow Java code to be sent to the Internet. But that site might wish to make restrictions on Java code that can be received.
The most permissive policy is to allow Java to be received and to let users adopt their own defenses or trust in the Java security model. More restrictive policies might allow Java classes only from trusted Web sites or not at all. The question is the degree of risk. As shown in this book, Java programs run on a well-configured JVM are very safe, compared with other types of executable content. Thus, if applets are to be blocked, other downloads also should be prevented. For example, macro viruses contained in word processor files are a major problem, but few companies prevent employees from exchanging such files with customers and suppliers.
3.3 Java Network Connections through the Firewall
In creating its own network connections through a firewall, a Java program faces all the difficulties described earlier, as well as the default SecurityManager restrictions that allow Java programs to contact only the server from which they were downloaded. One of the major problems people have encountered with Java programs and firewalls is trying to get the Java programs to communicate back to the server through a firewall. This problem becomes particularly evident
For example, the middle tier may be a fully functional WAS, hosting both servlets and enterprise beans, and the third tier may be the database system (see Figure on page 27). Alternatively, the Web server may be located in the DMZ, and the servlet and EJB engines may constitute the third tier, safely positioned in the intranet behind the firewall (see Figure on page 199 and Figure on page 200). With this architecture, all requests from the Internet get handled within the DMZ, and only authorized requests are allowed to proceed to enter into the intranet. In a more complicated architecture, an additional firewall may be used to separate the EJB engine from the servlet engine (see Figure on page 201). The additional firewall further restricts the requests that can reach the business applications.
From behind a firewall, a program can adopt two major approaches to retrieve data from a server outside the firewall: a URL connection or a socket connection. The first approach means using the URL classes from the java.net package to request data from a server using HTTP. Using a URL connection is easier to implement and is also likely to be the more reliable, as the Java runtime passes the URL request to the underlying application—browser or WAS—to process. Thus, if a proxy is defined, the Java code will automatically use it. However, URL connections suffer from the fact that the server side of the connection has limited capability; it can be only a simple file retrieval, a Common Gateway Interface (CGI) program, a servlet, or similar.
The second approach—a socket connection from a Java program to a remote server—involves the use of classes from the java.net package to create socket connections to a dedicated server application. The program will need to choose a port number to connect to, but many programs will not be allowed to open a socket connection through the firewall. Some types of programs have no real choice as to port number. For example, IBM Host On-Demand, shown in Figure on page 24, is a Java applet that is a 3270 terminal emulator and hence needs to use the tn3270 protocol to TELNET port 23. It is quite likely that this standard port would be allowed through the firewall; otherwise, encapsulation of tn3270 inside the SOCKS protocol may be the only answer.
Other programs need to make a connection to the server but do not need any special port. It may be that they can use a nonprivileged server port of 1024 or greater, but often these, too, are blocked by simple packet-filtering firewalls. A flexible approach is to let the Java program be configurable to allow direct connections, if allowed, or to use the SOCKS protocol to pass through the firewall.
Many HTTP proxy servers implement the connect method. This allows a client to send to the proxy an HTTP request that includes a header telling it to connect to a specific port on the target system. The connect method was developed to allow SSL connections to be handled by a proxy server, but it has since been extended to other applications. For example, Lotus Notes servers can use it. The connect method operates in a very similar way to SOCKS, and Java program connections can be implemented with it in much the same way as with SOCKS.
Another approach is to disguise the packets in another protocol, most likely HTTP, as this will have been permitted through the firewall. This approach will allow a two-way transfer of data between the client program and the server but will require a special type of server, which must be able to communicate with the client programs to process their disguised network traffic.
Finally, Java client/server applications in the network can use remote object access mechanisms, such as Java RMI. A practical implementation of this approach is described in the next section.
3.4 RMI through the Firewall
Java RMI (see Appendix A on page 547) allows developers to distribute Java objects seamlessly across the Internet. This implies that RMI too needs to be able to cross firewalls.
The normal approach that RMI uses, in the absence of firewalls, is that the client applet will attempt to open a direct network connection to the RMI port (default is port 1099) on the server. The client will send its request to the server and receive its reply over this network connection.
The RMI designers have made provisions for two firewall scenarios, both using RMI calls embedded in HTTP requests, under the reasonable assumption that HTTP will be allowed through the firewall. The RMI server itself will accept either type of request and format its reply accordingly. The client sends an HTTP POST request, with the RMI call data sent as the body of the POST request, and the server returns the result in the body of an HTTP response.
In the first scenario (Figure), the proxy server is permitted by the firewall to connect directly to the remote server's RMI port (1099). The client code will make an HTTP POST request to http://rmi_server:1099/. This request passes across the Internet to the remote server, where it is found to be an encapsulated RMI call. Therefore, the reply is sent back as an HTML response. In theory, this method could also be used with a SOCKS server, instead of a proxy server, if run by a SOCKSified application or if the TCP/IP stack is SOCKSified.
As well as assuming that the firewall on the client passes the RMI port, this scenario assumes that the remote firewall also accepts incoming requests directly to the RMI port. But in some organizations, the firewall manager may be reluctant to permit traffic to additional ports, such as the RMI port. An alternative configuration is available in case RMI data is blocked by either firewall (see Figure).
In the second scenario, the proxy server cannot use the RMI port directly, so the remote server, which supplied the applet, has a CGI-BIN program configured to forward HTTP on the normal port (80) to the RMI server's port 1099. This CGI-BIN program needs to be installed on the Web server in the cgi-bin directory. Once installed, the CGI-BIN program invokes the Java interpreter on the server to forward the request to the appropriate RMI server port and copies the standard CGI environment variables to Java properties. The CGI-BIN program passes the request on to the RMI port. The reply will be passed back to the Web server, which adds the HTML header line and returns the response to the client. In principle, this scenario would allow the RMI server to reside on a different system from the remote Web server in a three-tier model (see Section 2.1.2 on page 25).
Fortunately, all this work is performed automatically in the java.rmi package, so the software developer need not be concerned about the details. It is necessary only to configure the RMI server correctly and to ensure that the client uses the automatic mechanism for encapsulating RMI.
In the current version of RMI, the client stub code checks for the presence—ignoring the value—of system properties proxyHost or http.proxyHost, in order to decide whether to try using the HTTP encapsulation. If you are using a Web browser and encapsulated RMI does not seem to work, try explicitly setting these properties, as the browser may be using its own proxy HTTP, without setting proxyHost.
All this automatic encapsulation is not free, of course. Encapsulated RMI calls are at least an order of magnitude slower than direct requests, and proxy servers may add extra delays to the process as they receive and forward requests.