Search this book | Previous | Table of contents | Next

Uniform resource locators (URLs)

Uniform resources locators (URLs) are a way of unambiguously describing the locations of Internet resources.

The uniform resource locator (URL) is a fundamental part of the Web. It is utilized to unambiguously describe and identify both the protocol used by and the location of Internet resources. In general, a URL has the following form:

scheme://"common syntax"/path

Using URLs as a standard, Internet client programs like Web browsers can interpret URLs and retrieve the desired information. URLs describe the protocols and locations of Internet resources without regard to the particular Internet client software the user is employing to access them.

Each of part of a URL (scheme, "common syntax", and path) are described in the sections below.

Scheme

"Scheme" denotes the type of Internet resource, and it is always followed by a colon (:). The most common schemes include:

ftp - file transfer protocol
file - a local file
gopher - gopher protocol
http - hypertext transfer protocol
mailto - electronic mail address
news - Usenet news
telnet and tn3270 - interactive sessions
wais - wide area information servers

Other, less commons schemes include:

cid - content identifiers for MIME body part
mid - message identifiers for electronic mail
nntp - Usenet news for local NNTP access only
prospero - access using the prospero protocols
rlogin - another type of interactive session

In the future we may even see scheme like z3950 or whois to denote Z39.50 query/retrieval services or whois databases, respectively.

Examples

http:
file:
wais:
ftp:
gopher:
news:
nntp:
telnet:

"Common Syntax"

A pair of slashes (//) and a trailing slash (/) are used to surround "a common syntax" for those schemes "which refer to Internet protocols." This common syntax is dependent on the scheme. The "common scheme" can be divided into 3 parts:

User names and passwords
It includes options for user names and passwords. Names and passwords are delimited with a colon (:). This is useful for those FTP services using the word "guest" as a part of the log on procedure as opposed to "anonymous". Similarly, this is useful for instructing the end-user on how to log into a remote telnet sessions. The user name and password combination are then delimited from the balance of the "common syntax" by the at-sign (@).

Host
The Internet name or IP (Internet Protocol) address are the next part of the "common syntax". Examples include "sumex-aim.stanford.edu", "ftp.lib.ncsu.edu", or "152.1.24.177".

Port
Internet communications take place over "ports". Ports represent a section of a communications band available from the Internet protocols. Think of them like telephone number extensions. Many times you are given a telephone number with an extension (1 800 555 1212 x1234). You know this means to call the 800 number and ask for extension 1234. Internet ports work the same way where you communicate with a remote machine (152.1.24.177) and ask for a connection to port 80, or some other port. In most cases, the port number is assumed given a particular scheme. For example, the telnet protocol assumes you want port 23. Gopher assumes port 70. HTTP assumes port 80. WAIS (and Z39.50) assume port 210. But sometimes a information-service provider may not have the authority or does not want to use the standard port for their information service. In these cases, a port number must be specified and it is proceeded with a colon to separate it from the rest of the "common syntax." A good example includes the Geographic Name Server that resides at martini.eecs.umich.edu on port 3000 so it would be denoted as martini.eecs.umich.edu:3000

Examples

scheme://library.ncsu.edu/ - connect to the host library.ncsu.edu and use the default port defined by the scheme
scheme://library@library.ncsu.edu/ - connect to library.ncsu.edu with the username library and use the default port defined by the scheme
scheme://anonymous@ftp.lib.ncsu.edu/ - connect to library.ncsu.edu with the username anonymous and use the default port defined by the scheme
scheme://martini.eecs.umich.edu:3000/ - connect to martini.eecs.umich.edu on port 3000
scheme://gopher.lib.ncsu.edu:70/ - connect to gopher.lib.ncsu.edu on port 70
scheme://www.lib.ncsu.edu:80/ - connect to www.lib.ncsu.edu on port 80
scheme:/// - make no Internet connection; use the local machine
scheme://80/ - invalid; host names are a prerequisite when using ports
scheme://anonymous/ - invalid; again, a host name needs to be included
scheme://root:asecret@my.machine:23/ - "jus' plain 'ole stupid"

Path

The last part of a URL is the path. It includes all the text after the "common syntax." Simplistically speaking, think of the path statement as the lists of folders and then a file name. Like the "common syntax", the structure of the path section is dependent on the URL's scheme. The structure of ftp, http, wais, and gopher path statements are described below. Telnet- nor tn3270-based URLs do not use path statements.

FTP path statements

The path statements of FTP URLs are the easiest to understand. They form the basis of all other path statements. The path statement of an FTP URL takes the form of a directory (folder) structure with an optional file name appended to the end. Like this:

ftp://hostname.edu/folder/subfolder/sub-subfolder/filename.txt

In other words, this URL specifies the location of a file named "filename.txt" residing in the directory "sub-subfolder", which is in the directory "subfolder", which is in turn in the directory "folder."

Here is a real world example:

ftp://ftp.lib.ncsu.edu/pub/stacks/alawon/alawon-v1n04

This URL denotes the following actions:

FTP to ftp.lib.ncsu.edu
Log on as anonymous
Change the directory to /pub/stacks/alawon/
Get the file alawon-v1n04

You do not have to specify the filename of a FTP-based URL. You only have to specify the directory path. In doing so, you must end your URL with a trailing slash (/) as in:

ftp://ftp.lib.ncsu.edu/pub/software/mac/

If you do so, then the URL tells your URL-capable application to simply list all the files in the designated directory. Incidentally, you do not have to specify a path statement in an FTP-based URL either. By omitting the path statement, your Internet application should retrieve a list of the filenames of the root directory of the remote FTP archive.

HTTP path statements

Here is an example of a URL for an HTML document. It has the exact same structure as the FTP-based URL path statements:

http://www.lib.ncsu.edu/stacks/alawon-index.html

This URL opens up a HTTP connection to www.lib.ncsu.edu, changes the directory to stacks, and retrieves the file alawon-index.html.

Sometimes the path statements of HTTP-based URLs contain path and/or search arguments. The United States dollar sign ($) and the question mark (?) are used to denote these elements, respectively, as illustrated below:

http://hostname.edu/hello-world-03.script?TryThisAtHome
http://hostname.edu/hello-world-03.script$ScriptingIsFun
http://hostname.edu/hello-world-03.script$communication?WWWServersAreAbout...

It will be a rare instance for you to manually enter a URL containing path and/or search arguments. These elements are generated automatically by WWW browser applications when sending the input for WWW-based scripts like common gateway scripts or imagemapping programs. (For more information about the search and path arguments, see the section entitled "Adding rudimentary input to your .script files" in the chapter WWW Scripting.)

WAIS path statements

WAIS searches can be specified using URLs. Unfortunately, at the present time, only NCSA Mosaic for the X Window System directly implements the WAIS protocol. WAIS URLs have the following form:

wais://host:port/database?query

"Port" is assumed to be 210 (the standard WAIS/Z39.50 port), "database" is the source file to search, "?" delimits the database from the query, and "query" is the your search strategy. Here is an example of a URL for a WAIS search:

wais://vega.lib.ncsu.edu/alawon.src?nren

Gopher path statements

Gopher servers and files can be specified with URLs as well. Since gopher resource specifications require "Type" identifiers, and since the paths to gopher resources often include spaces, gopher URLs usually deviate from the norm. Here is an example of a URL for a gopher subdirectory:

gopher://gopher.lib.ncsu.edu/11/library/

Notice the pair of 1's after the Internet name of the computer. These 1's specify the resource as a directory. On the other hand, the following URL specifies a specific text file within that directory:

gopher://gopher.lib.ncsu.edu/00/library/about

The "00" denotes a text file. Constructing URLs is more difficult when the path and/or file names of the Internet resources contain special characters like spaces or colons. In these cases, escape codes must be used to denote the special characters. For example:

gopher://gopher.lib.ncsu.edu/0ftp%3amrcnext.cso.uiuc.edu%40/pub/etext/etext91/aesop11.txt

This long URL first asks a gopher server (gopher.lib.ncsu.edu) to FTP a file (aesop11.txt) from an anonymous FTP server (mrcnext.cso.uiuc.edu). Notice the "%3a" and "%40" in the URL. They are used to denote a colon (":") and at sign ("@"), respectfully. Furthermore, notice the zero proceeding the "ftp." This is used to identify the remote file as a text file.

As you can see, gopher URLs are particularly difficult to decipher. The easiest way to construct a URL for a gopher item it to access the gopher server via a Web client, traverse the gopher menus until you locate the resource, and then copy the displayed URL from the appropriate part of your client's screen.

Uniform resource locators (URLs)

Scheme

Examples

"Common Syntax"

Examples

Path

FTP path statements

HTTP path statements

WAIS path statements

Gopher path statements

See Also