Writing HTML Plug-Ins for Grail


Introduction

It is occasionally useful to write plug-ins to HTML. This can be used to experiment with proposed new HTML tags, or to implement features that were standardized since Grail was last released. Grail supports all of HTML 2.0 and many features proposed for future versions of HTML.

HTML plug-ins allow you to add support for new tags to the Grail browser, but not to the Grail printing utility. (Future versions of Grail should allow HTML plug-ins to change the way Grail handles its core tags.)

If you are interested in extending HTML, make sure you have a good understanding of the current HTML standards. The HTML 2.0 specification has been accepted as a proposed standard by the IETF, and the Web Consortium is coordinating future development of HTML. See the W3C page on HTML for more information.

Location of HTML Plug-In Modules

An HTML plug-in module is imported the first time a tag it supports is encountered. The module file, located in the Grail HTML plug-ins directory or in the Grail source directory, should be named tag.py. For example, to implement the tag <SPAM>, you'd create a module spam.py.

The Grail HTML plug-ins directory is a subdirectory named html of the Grail user directory. On Unix, the Grail user directory is given by the environment variable $GRAILDIR, or, if that variable is undefined or empty, the subdirectory named .grail of the user's home directory.

Contents of HTML Plug-In Modules

If the tag you define has a corresponding end tag (e.g. </SPAM>), the module should define two functions: start_spam() and end_spam(). If it has no end tag, the module should define the single function do_spam(). Often, tags have optional end tags in which case you can usually just define a do_spam() tag and let the parser just ignore the end tag. It can get complicated though because other start tags may implicitly close your <SPAM> tag, and you'll need to keep track of all that state yourself.

The module may define other objects for its own use as long as they don't begin with start_, end_ or do_.

The module may define handlers for additional tags that are only meaningful inside the tag that the module is named after. These extra tag handlers are used, for example, by the standard Grail module that handles the <FORM> tag, to support tags including <INPUT> and <TEXTAREA>, and by the <TABLE> tag to support all the additional table elements.

The start_tag() and do_tag() functions are invoked with two arguments: a parser object and an attribute list. The parser object is an instance of a subclass of HTMLParser. The end_tag() function is invoked with a single argument: a parser object. See the next section for a summary of the interface for parser objects in Grail.

The attribute list contains all of the HTML attributes present in the start tag. HTML plug-ins can receive the argument in two different ways: as a dictionary or as a list of tuples. Although the dictionary representation is preferred, the attributes are passed as a list of tuples by default (for backwards compatibility). Modules that treat the attribute list as a dictionary must define the variable ATTRIBUTES_AS_KEYWORDS in the module's global name space and set its value to true.

Regardless of whether the attributes are passed as a dictionary or a list of tuples, the attribute names are converted to lowercase. The order of the tags in the list is not defined. For example, the start tag

    <SPAM MEAT="ears,SNOUTS" shape=Square>
would produce either the dictionary
    {'shape': 'Square', 'meat': 'ears,SNOUTS'}
or one of the following lists of tuples:
    [('meat', 'ears,SNOUTS'), ('shape', 'Square')]
or
    [('shape', 'Square'), ('meat', 'ears,SNOUTS')].

If the tag includes an attribute that isn't assigned a value, e.g. the ISMAP attribute of the IMG tag, then the attribute is assigned the value None. Plug-ins that use these attributes should test for their presence instead of looking at the value.

Interfaces

An HTML plug-in module can access the parser object as well as a Viewer object and a Browser object that are reachable via the parser object.

Here are summaries of their interfaces:

Parser Object Interface

The parser object defines a number of methods that may be invoked by HTML plug-ins:
parser.start_tag(attributes)
For each tag ``built-in'' to Grail that has a corresponding end tag, except tags related to forms and tables, a method start_tag() is defined. Tables and forms are implemented as (complex!) plug-ins.

parser.do_tag(attributes)
For each tag ``built-in'' to Grail that has no corresponding end tag, except <ISINDEX> and <LINK>, a method do_tag() is defined. <ISINDEX> is implemented as an example plug-in.

parser.end_tag()
For each tag that has a start_tag() method, a corresponding method end_tag() (without arguments) is defined.

parser.add_subwindow(widget)
Add a subwindow to the viewer. The argument should be a Tk widget (e.g. a Button or Frame instance, but not a Toplevel instance) whose parent is parser.viewer.text (see the Viewer object interface). It should not be packed, though if it has any children these should probably be packed. This method is a wrapper around parser.viewer.add_subwindow() and should be used in preference to the latter.

parser.save_bgn()
Start saving text in a buffer instead of displaying it. This is useful for new HTML elements that interpret text inside them as some kind of arguments (e.g. the <TEXTAREA> element in forms). Use parser.save_end() to retrieve the saved text. Calls to parser.save_bgn() and parser.save_end() can be nested. Nested calls cause outer saves to lose the data captured by the inner call.

parser.save_end()
Returns the text saved in the buffer since the last call to parser.save_bgn(), and restores the previous operation of data capture.

parser.get_formatter()
Retrieves an AbstractFormatter (or subclass) instance which may be used to format text flow into the Viewer instance used to display the document.

parser.push_formatter(formatter)
Pushes a new formatter to the top of the formatter stack. This can be used for plug-ins that can contain arbitrary HTML in their sub-elements. For example, table cells and captions can contain essentially any HTML, however each cell is implemented as a separate mini-viewer. When a cell tag is encountered, the table parser wraps a new viewer in a new formatter and pushes the formatter onto the stack. All HTML for the cell then gets rendered into the cell's viewer.

parser.pop_formatter()
Removes the top formatter from the formatter stack. A plug-in should only pop formatters which it placed on the stack, and should pop all formatters it places on the stack.
The parser object contains a number of instance variables that may be used (but not changed) by HTML plug-ins:
parser.viewer
The Viewer object to which the parser is connected. The Viewer should only be used when the formatter provided by the get_formatter() method does not provide sufficient functionality.