Writing Protocol Support Plug-Ins for Grail


Introduction

A protocol support plug-in allows the Grail browser to understand a new URL scheme. The plug-in implements an interface that allows Grail to retrieve a resource given its URL. The interface breaks the retrieval process into several steps:

  1. create a new protocol handler to retrieve the resource,
  2. retrieve the document's metadata (e.g. its type, size, last modification time),
  3. retrieve the resource's actual data.

Protocol support plug-ins have to operate under relatively adverse circumstances: they must be able to perform their I/O tasks asynchronously, without blocking the main application if at all possible. Threads are not available.

The generic protocol API is designed to support the HTTP protocol, because it has the richest set of access methods and response codes. Other protocols use the same interface, but typically support a limited set of access methods and response codes; for example, the ftp protocol handler uses only a single access method and a single response code.

Location of Protocol Support Plug-In Modules

A protocol plug-in module is imported when the protocol is first referenced. The module file, residing in the Grail protocol plug-ins directory or in the Grail source directory, should be named schemeAPI.py where scheme is the URL prefix used by the protocol, in lowercase. For instance, since HTTP URLs begin with http:, the HTTP protocol implementation is contained in the file httpAPI.py. Non-alphanumeric characters in scheme names are translated to underscores. Protocol schemes should not start with a digit.

The Grail protocol plug-ins directory is a subdirectory named protocols of the Grail user directory. On Unix, the Grail user directory is given by the environment variable $GRAILDIR, or, if that variable is undefined or empty, the subdirectory named .grail of the user's home directory.

Contents of Protocol Support Plug-In Modules

A protocol module may define any number of objects for its own use, but it must define a function scheme_access that is invoked as

    scheme_access(url, method, params)
and returns a protocol handler object.

Normally, scheme_access is a class definition whose __init__() method has the required arguments. However, it is also allowed to have a function scheme_access() that returns an instance of some other class (perhaps a different class depending on the syntax of the url argument).

The arguments to scheme_access are:

The scheme_access() call may raise an IOError exception for serious errors, such as illegal URL syntax, host not found, or socket errors. (Other exceptions should be caught and translated to IOError if they fall in this category.)

The object returned by scheme_access() is a protocol implementation object, valid for one transaction only. (If the protocol can in fact re-use connections for subsequent transactions, such as the FTP protocol, it should maintain connection information internally.)

Protocol Handler Objects

Protocol handler objects implement a finite state machine with the following states: META, DATA, DONE. The initial state is META. A call to the getmeta() method causes a transition to DATA state. In the DATA state, calls to the getdata() method can cause the object to remain in that state or to move into the DONE state. In the DONE state, no methods should be called. The call to getdata() that causes the state change to DONE returns an empty string.

Protocol handler objects should provide the following methods. The required initial state and any state changes caused by the call are listed for each method, as well as the possible exceptions.

Protocol Handler Object Methods

fileno() -> int
Return the file number of the socket or file associated with the connection; or -1 if no file number is available. This may be used by the caller only in select() system calls.
States: META, DATA.
No state changes.
No exceptions.

pollmeta() -> (statusmessage, readyflag)
Return a short string describing the protocol status in user-readable form (e.g. for display in a progress window), and a flag indicating whether a call to getmeta() could return without blocking for I/O.
States: META.
No state changes.
Exceptions: IOError, if an irrecoverable error occurs.

getmeta() -> (errcode, errmessage, headers)
Return a numeric error code, an error message, and a set of headers representing additional information returned by the server. The interpretation of headers is the same as that used by HTTP. The error codes are also the same as those returned by HTTP; in particular, 200 is success and anything else is partial failure (though getdata() should still be called).
States: META.
State change: to DATA.
Exceptions: IOError, if an irrecoverable error occurs.
Additional notes: the headers dictionary doesn't have to be a dictionary, as long as it supports the has_key() and keys() methods and the x[key] indexing notation. It needn't support modification. Keys should be given as all lowercase.

polldata() -> (statusmessage, readyflag)
Like pollmeta() but for use in the DATA state.
States: DATA.
No state changes.
Exceptions: IOError, if an irrecoverable error occurs.

getdata(maxnbytes) -> string
Return at most maxnbytes of data. If not even a single byte is available, and the end of the stream has not yet been reached, this blocks until at least one byte is available or the stream is closed by the other end. When the stream is closed at the other end, this returns the empty string (and only then).
States: DATA.
State changes: remain in DATA (returning a non-empty string) or go to DONE (returning an empty string).
Exceptions: IOError, if an irrecoverable error occurs.

close() -> None
Close the connection.
States: META, DATA, DONE.
State change: to DONE.
No exceptions.

Notes

The set of error codes that getmeta() may return is defined by the HTTP 1.0 protocol specification, in particular in the section on Status Code Definitions.

Some protocols are simple enough that there is effectively no metadata. In this case, pollmeta() should return readyflag == 1 and getmeta() should return (200, "OK", {}) immediately. Likewise, in some protocols it may be impossible to determine whether getmeta() and/or getdata() might block. In such cases, pollmeta() and/or polldata() should return readyflag == 1. If getmeta() or getdata() may succeed without select ever noticing readable bytes on the socket, fileno() should return -1.

The module nullAPI exports a class null_access whose methods behave in that way. This class can be used as a base class for simple protocol support plug-in classes (see below).

Some URL schemes, e.g. mailto, don't return any meaningful data but interact with the user instead. This can be accomplished by returning an error code 204 from getmeta() (meaning "OK, no reply"). The user interaction should preferably be done in a modeless dialog.

Some URL schemes, e.g. hdl, CNRI's handle scheme, translate into other URL schemes. This can be accomplished by returning a 301 or 302 error code from getmeta() (meaning "resource has moved", either temporarily or permanently) together with a headers dictionary that contains an entry {"location": uri}. The entire return value from getmeta() should be a triple of the following form:

    (302, "Moved", {"location": uri})

Such errors are handled by higher levels. The protocol object should not attempt to implement the redirection itself, since it may be redirecting to a different protocol or to an already cached item.

If a scheme is merely an alternative spelling for another protocol, this may be accomplished by importing the other protocol module, e.g. (presuming x-http is an old name for http):

    # File x_httpAPI.py
    import httpAPI
    x_http_access = httpAPI.http_access

Some URL schemes, e.g. http, can return authentication errors (error code 401) which mean that the request should be retried after the user has entered some authentication information. This is handled by higher levels. Specific keys are assumed to be present in the headers returned by getmeta().

When a protocol object raises an exception (IOError or otherwise) its state becomes unknown. The only legal method in this case is close().

Although some URL schemes (e.g. mailto) require interaction with the user, most don't. Those that don't are independent of the GUI toolkit used by Grail and should be able to be used unchanged in the event that Grail were to be rewritten using another interface than Tkinter. This means that protocol modules shouldn't interact with the user in order to request additional information -- instead, they should return error codes designed specifically for this purpose.

If a protocol is usable for communication with a Proxy server (such as the HTTP protocol) its constructor should accept an alternative form of the url (first) argument: if this is a tuple of two elements, the first item specifies the host:port of the proxy server in the usual manner, and the second item is the selector string to be sent to the proxy server. In this case, the protocol module should not interpret the selector string further, as it is a full URL using any protocol scheme supported by the proxy server.

Under certain circumstances if is desirable for a protocol module to get access to the application object. This can be done by executing the statement from __main__ import app.

Examples

Here's an example of a protocol that translates into another one after massaging the URL string. This is read from the file spamAPI.py.

"""Protocol Support Plug-In for 'spam:' scheme.

This is like the 'file:' scheme but prefixes the path
with '/tmp', so the URL 'spam:splat' is equivalent to
'file:/tmp/splat'.
"""

from nullAPI import null_access

class spam_access(null_access):

    def __init__(self, url, method, params):
        null_access.__init__(self, url, method, params)
        if url[:1] != '/': url = '/' + url
        self.url = "file:/tmp" + url

    def getmeta(self):
        null_access.getmeta(self)  # assert, state change
        return 302, "Moved", {'location': self.url}
For more examples, see the protocol support modules that are part of the standard Grail source, e.g. nullAPI.py, httpAPI.py, fileAPI.py, ftpAPI.py, hdlAPI.py, mailtoAPI.py.

Bugs, Caveats

There are a number of minor difficulties with the current interface specification. For instance, the creation of the protocol handler object is completely synchronous, and the meaning of the status message returned by pollmeta() and polldata() is undefined. Also, there is an undocumented additional optional argument to scheme_access() in case method is POST; this is currently only used by the HTTP protocol.