April 20, 2011, 3:35 p.m.
posted by pypa
Kinds of URLs
Many kinds of URLs are defined by the Uniform Resource Locator specification. (See Appendix A, "Sources for Further Information," for a pointer to the most recent version.) This section describes some of the more popular URLs and some situations to look out for when using them.
HTTP URLs are by far the most common type of URLs because they point to other documents on the Web. HTTP, which stands for Hypertext Transfer Protocol, is the protocol that World Wide Web servers use to communicate with web browsers.
HTTP URLs follow this basic URL form:
If the URL ends in a slash, the last part of the URL is considered a directory name. The file that you get using a URL of this type is the default file for that directory as defined by the HTTP server, usually a file called index.html. If the Web page you're designing is the top-level file for all a directory's files, calling it index.html is a good idea. Putting such a file in place will also keep users from browsing the directory where the file is located.
You also can specify the filename directly in the URL. In this case, the file at the end of the URL is the one that is loaded, as in the following examples:
Using HTTP URLs such as the following, where foo is a directory, is also usually acceptable:
In this case, because foo is a directory, this URL should have a slash at the end. Most Web servers can figure out that this is a link to a directory and redirect to the appropriate file. Some older servers, however, might have difficulties resolving this URL, so you should always identify directories and files explicitly and make sure that a default file is available if you're indicating a directory.
FTP URLs are used to point to files located on FTP serversusually anonymous FTP servers; that is, the ones that allow you to log in using anonymous as the login ID and your email address as the password. FTP URLs also follow the standard URL form, as shown in the following examples:
Because you can retrieve either a file or a directory list with FTP, the restrictions on whether you need a trailing slash at the end of the URL aren't the same as with HTTP. The first URL here retrieves a listing of all the files in the foo directory. The second URL retrieves and parses the file homepage.html in the foo directory.
Navigating FTP servers using a web browser can often be much slower than navigating them using FTP itself because the browser doesn't hold the connection open. Instead, it opens the connection, finds the file or directory listing, displays the listing, and then closes down the FTP connection. If you select a link to open a file or another directory in that listing, the browser constructs a new FTP URL from the items you selected, reopens the FTP connection by using the new URL, gets the next directory or file, and closes it again. For this reason, FTP URLs are best for when you know exactly which file you want to retrieve rather than for when you want to browse an archive.
Although your browser uses FTP to fetch the file, if it's an HTML file, your browser will display it just as it would were it fetched using the HTTP protocol. Web browsers don't care how they get files. As long as they can recognize the file as HTML, either because the server explicitly says that the file is HTML or by the file's extension, browsers will parse and display that file as an HTML file. If they don't recognize it as an HTML file, no big deal. Browsers can either display the file if they know what kind of file it is or just save the file to disk.
All the FTP URLs in the preceding section are used for anonymous FTP servers. You also can specify an FTP URL for named accounts on an FTP server, like the following:
In this form of the URL, the username part is your login ID on the server, and password is that account's password. Note that no attempt is made to hide the password in the URL. Be very careful that no one is watching you when you're using URLs of this formand don't put them into links that someone else can find!
Furthermore, the URLs that you request might be cached or logged somewhere, either on your local machine or on a proxy server between you and the site you're connecting to. For that reason, it's probably wise to avoid using this type of non-anonymous FTP URL altogether.
The mailto URL is used to send electronic mail. If the browser supports mailto URLs, when a link that contains one is selected, the browser will prompt you for a subject and the body of the mail message, and send that message to the appropriate address when you're done. Depending on how the user's browser and email client are configured, mailto links might not work at all for them.
The mailto URL is different from the standard URL form. It looks like the following:
Here's an example:
If your email address includes a percent sign (%), you'll have to use the escape character %25 instead. Percent signs are special characters to URLs.
Unlike the other URLs described here, the mailto URL works strictly on the client side. The mailto link just tells the browser to compose an email message to the specified address. It's up to the browser to figure out how that should happen. Most browsers will also let you add a default subject to the email by including it in the URL like this:
mailto:[email protected]?subject=Hi there!
When the user clicks on the link, most browsers will automatically stick Hi there! in the subject of the message. Some even support putting body text for the email message in the link, like this:
mailto:[email protected]?subject=Hi there!&body=Body text.
Usenet news URLs have one of two forms:
The first form is used to read an entire newsgroup, such as comp.infosystems.www.authoring.html or alt.gothic. If your browser supports Usenet news URLs (either directly or through a newsreader), it'll provide you with a list of available articles in that newsgroup.
The second form enables you to retrieve a specific news article. Each news article has a unique ID, called a message ID, which usually looks something like the following:
To use a message ID in a URL, remove the angle brackets and include the news: part:
Be aware that news articles don't exist foreverthey expire and are deleted. So, a message ID that was valid at one point can become invalid a short time later. If you want a permanent link to a news article, you should just copy the article to your web presentation and link it as you would any other file.
Both forms of URL assume that you're reading news from an NNTP server, and they can be used only if you have defined an NNTP server somewhere in an environment variable or preferences file for your browser. Therefore, news URLs are most useful simply for reading specific news articles locally, not necessarily for using in links in pages.
News URLs, like mailto URLs, might not be supported by all browsers.
File URLs are intended to reference files contained on the local disk. In other words, they refer to files located on the same system as the browser. For local files, file URLs take one of these two forms: the first with an empty hostname (three slashes rather than two) or with the hostname as localhost:
Depending on your browser, one or the other will usually work. (If you're in doubt, you can open the file from within your browser and look at the address bar to see what its file: URL is.)
File URLs are very similar to FTP URLs. In fact, if the host part of a file URL is not empty or localhost, your browser will try to find the given file by using FTP. Both of the following URLs result in the same file being loaded in the same way:
Probably the best use of file URLs is in startup pages for your browser (which are also called home pages). In this instance, because you'll almost always be referring to a local file, using a file URL makes sense.
The problem with file URLs is that they reference local files, where local means on the same system as the browser pointing to the filenot the same system from which the page was retrieved! If you use file URLs as links in your page, and someone from elsewhere on the Internet encounters your page and tries to follow those links, that person's browser will attempt to find the file on her local disk (and generally will fail). Also, because file URLs use the absolute pathname to the file, if you use file URLs in your page, you can't move that page elsewhere on the system or to any other system.
If your intention is to refer to files that are on the same file system or directory as the current page, use relative pathnames rather than file URLs. With relative pathnames for local files and other URLs for remote files, you shouldn't need to use a file URL at all.