June 16, 2011, 5:09 a.m.
posted by pypa
Using Apache Access Control Files
Apache is the most popular web server, and is especially popular among ISPs that provide web hosting. For that reason, when I discuss features of web servers, I'm going to discuss Apache. You can apply the general techniques described here to any web server, but the specifics will apply only to Apache. Under Apache, assuming the server administrator has given you the right to do so, you can set up access control files that you can use to manage access to your directories, along with a lot of other things. These days, many kinds of server configuration directives can be used on a per-directory basis from the access file.
The access control file is generally named.htaccess. It can actually be named anything that the server administrator chooses, but .htaccess is the default, and there's really no good reason to change it. Because the filename begins with a period, it's a hidden file under UNIX. This keeps it from cluttering things up when you're looking for content files in a directory.
Apache configuration directives begin with the directive, and the rest of the line consists of the parameters associated with the directive. They're entered one per line. The format is usually something like this:
Sometimes the configuration directives are slightly more complex, but that's the general format. The first type of configuration I'll explain how to manipulate is actual access to the pages in the directory (and its subdirectories).
Managing Access to Pages
Controlling access to pages can be somewhat complex, mainly because Apache's access control system is very flexible. First, let's look at the access control directives themselves. The four you need to pay attention to are allow, deny, require, and order. The allow and deny directives enable you to control access to pages based on the IP address or domain name of the computer the visitor is using. Let's say you want to disallow all users from samspublishing.com (meaning that they're using a computer on the samspublishing.com network) from some pages on your site. You could just stick
deny from samspublishing.com
in your .htaccess file. These directives match parts of hostnames, so even if the user's hostname is firewall.samspublishing.com, he'll still be denied. By the same token, you can deny based on IP address:
deny from 192.168.1
deny from com
Needless to say, this would prevent anyone on a .com network from viewing your pages. That's pretty harsh. If you're not careful, you can restrict everyone from seeing your pages. If that's what you intend to do, you can just use this directive:
deny from all
Why would you want to do that? Well, it makes more sense when combined with two other directives: order and allow. Using order, you can specify the order in which deny and allow rules are applied. allow is just like deny, except that it allows users that meet the rule you specify to see the pages. So, you can write rules like this:
order deny, allow deny from all allow from samspublishing.com
This restricts use of your pages to only people on the samspublishing.com network. Based on the order directive, the deny rule is applied first, shutting out everyone. Then if the user meets your allow rule, she's allowed in. The last directive in this family is require. This directive, rather than basic access on how the user's machine identifies itself, is used to require user authentication. Before you can use it, you need to set a few things up.
First, you'll need to create a file containing a list of usernames and passwords that can be used to access your site. Apache helpfully provides a tool to create these files, called htpasswd. It's invoked in the following manner:
htpasswd -c /usr/local/apache/passwords account
The arguments are as follows: the flag -c indicates that the password file should be created if it doesn't already exist. The next argument, /usr/local/apache/passwords, is the name of the password file. The last argument is the name of the account you want to create. The program will then ask you to type the password for the account and confirm it. Once you've done so, the account is created, along with the password file. At that point, you can add more accounts to your file by repeatedly running htpasswd (without the -c flag) and passing in new account names each time. The reason you can't just create a text file in a text editor is that the passwords are encrypted when they're saved in the file. htpasswd takes care of that.
You can also create groups of users by creating a group file. To create this file, just use your favorite text editor and use the following format:
groupname: account1 account2 account3 ...
Substitute groupname with a group name of your own choosing, such as managers, and then list all the accounts from your password file that are members of the group. Once you've set up your password file and (optionally) your group file, you're ready to start using the require directive. To set up what's referred to as an authentication realm, the following directives are used:
Now let's look at how these are used in the file:
AuthType Basic AuthName "Administrator Files" AuthUserFile /usr/local/apache/passwords AuthGroupFile /usr/local/apache/groups
Once you've set up the authentication realm, you can start using require directives to specify who's allowed to see the pages and who isn't. The format for require directives is as follows:
require group administrators require user fred bob jim betty
First, you specify whether the require directive refers to users or groups, and then you include a list of usernames or group names. If you included the previous directives in your .htaccess files, the users fred, bob, jim, and betty would be able to access the pages, along with any users in the administrators group.
Although .htaccess files were once associated strictly with access control, their capabilities were eventually expanded to encompass nearly all of the configuration directives for Apache. There's a full list of configuration directives for Apache 1.3 at http://httpd.apache.org/docs/mod/directives.html.
If you're using Apache 2.0, you can use http://httpd.apache.org/docs/2.0/mod/directives.html.
I want to talk about two in particular: Redirect and RedirectMatch. First, let's talk a bit about redirection. Redirecting users from one URL to another is all too common. For example, let's say you have a directory called aboutus on your website, and you want to move everything to a directory called about. One common way of handling things so that your users don't get a dreaded 404 Not Found error when they go to the old URL is to put an index.html file in that directory that looks like this:
<html> <head> <title>Moved</title> <meta http-equiv="refresh" content="1; url=http://www.example.com/about" /> </head> <body> <p>This page has moved to a <a href="/about/">new location</a>.</p> </body> </html>
The <meta> tag basically tells the browser to wait one second and then proceed to the URL specified. The content on the page is there just to handle the rare case in which the user's browser doesn't do what the <meta> tag tells it to. This is how many people handle these sorts of cases.
There's one obvious problem, though. Let's say you had many pages in the aboutus directory. You'd have to create pages like this one to replace each of them. The other, slightly less obvious problem, is that these tags mess with the Back button. Unless users are careful, they'll go back to the redirect page, get redirected to the new page again, go back again, get redirected again, and on and on. To get back past a page that uses a <meta> tag to redirect users, you have to click the Back button twice in rapid succession.
Using the Redirect directive, you can solve this problem much more elegantly. Here's how it works:
Redirect /aboutus http://www.example.com/about
Any requests for URLs that begin with /aboutus will be redirected to the /about directory. Apache will also take everything after the path specified in the Redirect directive and append it to the target URL. So, a request for /aboutus/management.html will be redirected to http://www.example.com/about/management.html. This makes the redirection completely transparent to the user. You can even specify whether the redirect is permanent or temporary like this:
Redirect temp /aboutus http://www.example.com/about Redirect permanent /aboutus http://www.example.com/about Redirect seeother /aboutus http://www.example.com/about
You can indicate that /aboutus is gone without redirecting the user like this:
Redirect gone /aboutus
If you need to match something other than the beginning of the URL, you can use RedirectMatch, which uses regular expressions as the matching part of the URL. There's no space here to discuss regular expressions, but I'll give you one example of how RedirectMatch works. Let's say you replace all of your .html files on your site with .php files so that you can use PHP includes for navigation. Here's the rule:
RedirectMatch (.*)\.html$ http://www.example.com$1.php
Let me break that down. It basically says that any URL ending in .html should be redirected to the same URL on the server http://www.example.com except that the .html ending should be replaced with .php. First, let's look at the URL to match, from end to beginning. It ends with $, which indicates that the string to be matched must be at the end of the URL. The \.html is the string to match. The \ is in there to indicate that the . should actually appear in the URL, and not be treated as a regular expression metacharacter. The (.*) says "match everything up to the .html." The .* matches everything, and the parentheses indicate that the regular expression should remember what was matched. In the target URL, the part of the URL that was matched is retrieved and plugged in.