by Marko Riedel
The goal is to implement a web server that lets the client navigate the server’s file system. It should respond in two different ways. It should send the contents of the file if a file is requested. If a directory is requested, it should send a directory listing to the client so that the user can navigate the file system. Figure 7 shows the server display a page with w3m.
We’ll be using xinetd to handle requests to the web server. This way we don’t have to worry about implementing a complete daemon that forks a child process for every request etc., xinetd will handle all of this for us. This is the configuration that we use.
port = 7788
socket_type = stream
wait = no
user = wwwrun
group = nogroup
server = /usr/sbin/gswebsrv.sh
It is basically self-explanatory. We specify user and group for the server as well as the port. The entry “server” points to a shell script (bash) that performs some initialization. It sets the requisite GNUstep variables, the user and the time zone and invokes the actual server that will handle the request. It is important to set these variables correctly, or else warnings by GNUstep will be output to the web server. Needless to say the home directory of wwwrun has to exist for this to work properly.
Given this setup, xinetd will listen on port 7788, fork a process for every request, connect the socket to the standard input and output of the process and invoke the shell script, which in turn sets the necessary variables and starts the actual server. The server only has to read the request from the standard input and produce a response on the standard output.
HTTP knows many different status codes, of which we’ll be using quite a few, mostly to signal errors. The server sends a request followed by a set of headers, which we’ll ignore in our server. The client sends a status code, followed by a set of headers and the response data.
These are the codes that our server will produce:
We could have done with fewer codes, but we use them here to illustrate the problem of writing a web server. All these codes with the exception of the first one are error codes, so we start with a routine that handles errors. It outputs the error code followed by a detailed description of the problem, and exits the server (recall that there is one server process for every request). The routine constructs the code description from the status code itself and the description and stores it in the string msg. These are the entries shown in the list above. Subsequently the routine builds the body of the response, whose title contains the code description. The body lists the description and any explanatory details we might have supplied. It includes the process name and the host name so that both can be identified easily during debugging. The last step is to output the status code, the content type of the message body, the HTML body itself and exit the server.
The routine main must process the two types of requests. The very first thing it needs to do is to read the request from the standard input. For this purpose we obtain the appropriate file handle and read any data that might be available. We need to know how many bytes have been read, so that we can convert the contents of the data object into a string.
We convert the data to a string and split the request into lines. We are only interested in the first line (the request itself), but we could conceivably expand the server to do additional processing on the headers that are stored in the array lines.
A good request line should look like this:
GET /path/to/file HTTP/1.1
Hence we need to split it into fields so that we can check the components of the request and deliver the result. We use code that is also present in the df recipe, i.e. we obtain a character set that contains white space and a scanner whose data source is the first line we have read. We scan fields until the scanner reaches the end of the line.
A series of validity checks follows. There must be three fields: the request, the entity requested and the version of the protocol.
Our server only responds to GET requests, upon receipt of which it produces the contents of the file or a directory browser.
We will be picky about the protocol that we support, it must be either HTTP 1.0 or HTTP 1.1.
This ends the series of request checks. We now know that we have a good request. However, it is entirely possible that the client has requested a file or a directory that does not exist. We get the default file manager and ask it to check the path from the request. We return “Not Found” if there is no file or directory at that path.
The last check is to determine whether we can read the file or directory.
The method isReadableFileAtPath uses the access(2) call and works on files and directories. We send an error message if we cannot access the requested entity.
We can send a 220 OK response now that the request and the requested entity have been verified.
Start with the easy part, i.e. serving files. We use a minimal set of headers, namely the content type and the content length. The content type is application/octet-stream and the content length is obtained from the file manager.
It remains to send the contents of the file. We could read the entire file into a data object and send the file all at once, but this could result in a server process requesting a lot of memory. Instead we choose to serve the file in chunks of 64K.
We obtain a file handle for reading the file and the file handle for standard output (an effort was made to use foundation objects rather than system calls). We read one chunk after another and output the current chunk immediately. We close the file when there are no more data.
That’s it for the case of serving files. Producing a browsable directory listing requires a bit more effort. We start by ensuring that the path ends in a path delimiter and append one if this is not the case.
We start by constructing the title of the HTML document that we will serve, and output the content type and the beginning of the document (title, open body tag, background color).
We wish to have a certain feature to simplify navigation. There should be a header that displays the current directory in such a manner that directories that are higher up in the tree are clickable, e.g. if the path is /path/to/subdirectory/, then both path and to should be clickable and take the user to /path and /path/to, respectively.
The first step is to split the path into components.
A path like /path/to/subdirectory/ yields five components, the first and last of which are slashes; the root directory yields a single component. We process the inner components of the path, e.g. path, to and subdirectory. We construct the subpaths for each inner component excluding the last one, giving /path and /path/to. We output an anchor for each component. The anchor points to the complete path and contains the last component of the subpath for display. The last component of the complete path is not displayed in this manner because it points to the directory being displayed (that’s why we have cind<cmax-2 rather than cind<cmax-1.)
It remains to display the last component, which is not clickable. There is no last component when we browse the root directory. This concludes the construction of the navigable header.
The user must be able to ascend the directory tree after he has descended it in search of some file or directory. We output an anchor for this purpose. If we are not browsing the root directory, then output an anchor with the title Up one level, which points to the parent directory.
We are now ready to enumerate the contents of the directory. Start by obtaining the contents of the directory (which do not include ‘‘.’’ and ‘‘..’’, by the way). Sort them alphabetically, but ignoring case, and fetch the enumerator of the sorted array. The string item will hold a single entry.
We produce one line of output for each item and iterate over the items with the enumerator. We construct the full path to each item and check whether it is a directory or not.
The anchor for the item points to the full path and lists the item, i.e. the last component of the full path. We mark directories with the string ‘‘[+]’’, chosen because it resembles the icon that is used for directories by some graphical browsers.
We pad the line up to column sixty, so that the modification time and the size of the file line up properly when we output them, which we’ll do next.
We obtain the attributes of the current item from the file manager and extract the modification date and the size. By the way, this is where the variable TZ from the shell script comes into play. We output the date and the file size. This ends the current iteration of the loop. Note that the entire listing will be displayed as-is, because it is bracketed by PRE tags.
The last step is to close the BODY and HTML tags, flush buffers, and exit the server. Easy, wasn’t it?