A protocol support plug-in allows the Grail browser to understand a new URL scheme. The plug-in implements an interface that allows Grail to retrieve a resource given its URL. The interface breaks the retrieval process into several steps:
Protocol support plug-ins have to operate under relatively adverse circumstances: they must be able to perform their I/O tasks asynchronously, without blocking the main application if at all possible. Threads are not available.
The generic protocol API is designed to support the HTTP protocol, because it has the richest set of access methods and response codes. Other protocols use the same interface, but typically support a limited set of access methods and response codes; for example, the ftp protocol handler uses only a single access method and a single response code.
A protocol plug-in module is imported when the protocol is first
referenced. The module file, residing in the Grail protocol
plug-ins directory or in the Grail source directory, should be named
schemeAPI.py
where scheme is the URL prefix
used by the protocol, in lowercase. For instance, since HTTP URLs
begin with http:
, the HTTP protocol implementation is
contained in the file httpAPI.py
. Non-alphanumeric
characters in scheme names are translated to underscores. Protocol
schemes should not start with a digit.
The Grail protocol plug-ins directory is a subdirectory named
protocols
of the Grail user directory. On Unix, the
Grail user directory is given by the environment variable
$GRAILDIR
, or, if that variable is undefined or empty,
the subdirectory named .grail
of the user's home
directory.
A protocol module may define any number of objects for its own use, but it must define a function scheme_access that is invoked as
scheme_access(url, method, params)and returns a protocol handler object.
Normally, scheme_access
is a class definition
whose __init__()
method has the required
arguments. However, it is also allowed to have a function
scheme_access()
that returns an instance of some
other class (perhaps a different class depending on the syntax of the
url argument).
The arguments to scheme_access
are:
url
: the URL to be opened with
scheme:
stripped
method
: the access method to retrieve the resource
with. Multiple access methods, like GET and POST, are supported by the
HTTP protocol; most other protocols will only support GET.
params
: a dictionary containing additional headers to
be sent with the request for the resource. Again, this argument is
intended primarily for use with the HTTP protocol.
The scheme_access()
call may raise an IOError
exception for serious errors, such as illegal URL syntax, host not
found, or socket errors. (Other exceptions should be caught and
translated to IOError if they fall in this category.)
The object returned by scheme_access()
is a
protocol implementation object, valid for one transaction only. (If
the protocol can in fact re-use connections for subsequent
transactions, such as the FTP protocol, it should maintain connection
information internally.)
Protocol handler objects implement a finite state machine with the
following states: META, DATA, DONE. The initial state is META. A
call to the getmeta()
method causes a transition to DATA
state. In the DATA state, calls to the getdata()
method can
cause the object to remain in that state or to move into the DONE
state. In the DONE state, no methods should be called. The call to
getdata()
that causes the state change to DONE returns an
empty string.
Protocol handler objects should provide the following methods. The required initial state and any state changes caused by the call are listed for each method, as well as the possible exceptions.
fileno() -> int
select()
system calls.
pollmeta() -> (statusmessage, readyflag)
getmeta()
could return
without blocking for I/O.
getmeta() -> (errcode, errmessage,
headers)
getdata()
should still be called).
has_key()
and
keys()
methods and the x[key]
indexing
notation. It needn't support modification. Keys should be given as
all lowercase.
polldata() -> (statusmessage, readyflag)
pollmeta()
but for use in the DATA state.
getdata(maxnbytes) -> string
maxnbytes
of data. If not
even a single byte is available, and the end of the stream has not yet
been reached, this blocks until at least one byte is available or the
stream is closed by the other end. When the stream is closed at the
other end, this returns the empty string (and only then).
close() -> None
The set of error codes that getmeta()
may return is
defined by the HTTP 1.0 protocol
specification, in particular in the section on Status
Code Definitions.
Some protocols are simple enough that there is effectively no
metadata. In this case, pollmeta()
should return
readyflag == 1 and getmeta()
should return
(200, "OK", {})
immediately. Likewise, in some protocols
it may be impossible to determine whether getmeta()
and/or getdata()
might block. In such cases,
pollmeta()
and/or polldata() should return
readyflag == 1. If getmeta()
or
getdata()
may succeed without select ever noticing
readable bytes on the socket, fileno()
should return -1.
The module nullAPI
exports a class
null_access
whose methods behave in that way.
This class can be used as a base class for simple protocol support
plug-in classes (see below).
Some URL schemes, e.g. mailto, don't return any meaningful data but
interact with the user instead. This can be accomplished by returning
an error code 204 from getmeta()
(meaning "OK, no
reply"). The user interaction should preferably be done in a modeless
dialog.
Some URL schemes, e.g. hdl, CNRI's handle scheme, translate into
other URL schemes. This can be accomplished by returning a 301 or 302
error code from getmeta()
(meaning "resource has moved",
either temporarily or permanently) together with a headers dictionary
that contains an entry {"location": uri}
. The
entire return value from getmeta()
should be a triple of
the following form:
(302, "Moved", {"location": uri})
Such errors are handled by higher levels. The protocol object should not attempt to implement the redirection itself, since it may be redirecting to a different protocol or to an already cached item.
If a scheme is merely an alternative spelling for another protocol, this may be accomplished by importing the other protocol module, e.g. (presuming x-http is an old name for http):
# File x_httpAPI.py import httpAPI x_http_access = httpAPI.http_access
Some URL schemes, e.g. http, can return authentication errors
(error code 401) which mean that the request should be retried after
the user has entered some authentication information. This is handled
by higher levels. Specific keys are assumed to be present in the
headers returned by getmeta()
.
When a protocol object raises an exception (IOError or otherwise)
its state becomes unknown. The only legal method in this case is
close()
.
Although some URL schemes (e.g. mailto) require interaction with the user, most don't. Those that don't are independent of the GUI toolkit used by Grail and should be able to be used unchanged in the event that Grail were to be rewritten using another interface than Tkinter. This means that protocol modules shouldn't interact with the user in order to request additional information -- instead, they should return error codes designed specifically for this purpose.
If a protocol is usable for communication with a Proxy server (such
as the HTTP protocol) its constructor should accept an alternative
form of the url (first) argument: if this is a tuple of two
elements, the first item specifies the
host:port
of the proxy server in the usual
manner, and the second item is the selector string to be sent to the
proxy server. In this case, the protocol module should not interpret
the selector string further, as it is a full URL using any protocol
scheme supported by the proxy server.
Under certain circumstances if is desirable for a protocol module
to get access to the application object. This can be done by
executing the statement from __main__ import app
.
"""Protocol Support Plug-In for 'spam:' scheme. This is like the 'file:' scheme but prefixes the path with '/tmp', so the URL 'spam:splat' is equivalent to 'file:/tmp/splat'. """ from nullAPI import null_access class spam_access(null_access): def __init__(self, url, method, params): null_access.__init__(self, url, method, params) if url[:1] != '/': url = '/' + url self.url = "file:/tmp" + url def getmeta(self): null_access.getmeta(self) # assert, state change return 302, "Moved", {'location': self.url}For more examples, see the protocol support modules that are part of the standard Grail source, e.g.
nullAPI.py
,
httpAPI.py
, fileAPI.py
,
ftpAPI.py
, hdlAPI.py
,
mailtoAPI.py
.
There are a number of minor difficulties with the current interface
specification. For instance, the creation of the protocol handler
object is completely synchronous, and the meaning of the status
message returned by pollmeta()
and
polldata()
is undefined. Also, there is an undocumented
additional optional argument to scheme_access()
in
case method is POST
; this is currently only used
by the HTTP protocol.