Copyright © 1996, Corporation for National Research Initiatives. All Rights Reserved.
Author: Guido van Rossum
Organization: Corporation for National Research Initiatives (CNRI), Reston, Virginia
Date: May 30, 1996
Version: $Revision: 1.2 $
Grail is an Internet browser written in Python. Why another browser, you may ask? There are technical as well as "worldly" reasons to write yet another browser. Some of the latter include the publicity it generates for Python, and the sheer fun of writing your own browser. Technically, Grail's design tries to get away from the monolithic design of the Mosaic/Netscape generation of browsers, by providing a plug-in architecture. Grail is also the easiest way to support applets written in Python.
The paper starts with a brief overview of the project's goals and non-goals, some statistics, and Grail's legal status. We then proceed to the meat of the paper: a feature overview, and a description of Grail's architecture, with an emphasis on support for applets. The next section explains the security measures (implemented or planned) for applets in more detail. Finally, we say a few words on the choice of Tk and its performance.
One of the project goals for Grail is to provide a "hackable" browser for researchers. If a researcher has a bright idea about how browsing might be improved, it's a pain to have to hack the Mosaic C sources (are there any other browsers whose source is available to researchers?) before the idea can be tried. Java applets can't be used to modify or augment the general browser's behavior, and Netscape plug-ins are a pain to write. Since Python code is often 5-10 times smaller than corresponding C code, hacking Grail will be much more effective, and let the researcher concentrate on the idea instead of its implementation.
As a bonus, it's easier to try out several variants of an idea: there is less "emotional investment" in a particular set of data structures chosen early during implementation, which might restrict later freedom of experimentation. This is not just idle talk: our own development of Grail has seen much restructuring of the code (as the differences between successive releases clearly show), most of which was painless.
A non-goal of the project is to provide Java compatibility. Translation between Java and Python or between the Java virtual machine and the Python virtual machine is simply not feasible, given the vast differences between the two languages. Python is dynamically typed, has late binding, and everything is an object; Java is statically typed, has compile-time binding of names, and has a much more traditional memory model where 32-bit memory words and registers can contain a number of low-level data types such as int, float or pointer. Both languages also depend heavily on their respective, extensive libraries, and most translated programs would only run if their libraries were also translated; this would add considerable complexity since much of these libraries are written in C and interface to the C run-time API of the respective interpreters.
Another non-goal is Netscape compatibility. We're sorry, but we'd be spending all our time reverse engineering Netscape, and we're already forced to do more than enough of that, anyway. We do track the HTML standardization efforts of the IETF HTML working group and the World-Wide Web Consortium, and implement what we can given Tk's limitations. However, we can't promise to render all possible forms of invalid HTML in the same way that Netscape does it.
Grail was started in August 1995 as a quick and dirty demo. It quickly became a serious project, as witnessed by the release of version 0.2 in November 1995, and again by the planned release of version 0.3 (currently in beta) in June 1996.
Grail 0.2 was downloaded roughly 640 times (excluding obvious duplicates) in 5 months. Grail 0.3 beta 1 was downloaded roughly 470 times in 4 weeks! At the time of writing, the 0.3 beta 2 release has been out for 2 weeks and it has been downloaded roughly 170 times (about one third of which were duplicates with the beta 1 downloads).
The 0.3 beta 2 release consists of 108 Python source files, containing 27,000 lines, or 916,000 bytes. Only about a quarter of this comprises the "main" code; the rest consists of ancillary tools, utility modules, and so on.
Grail is currently distributed under a somewhat restrictive license, which prohibits redistribution and commercial use. We are working on a more liberal licensing scheme, allowing unmodified redistributions (even for a fee, like CD-ROM freeware collections), commercial use, and distribution of derived works -- all assuming the CNRI copyright notice is displayed at appropriate moments and places.
We are committed to resolving these and other legal issues before the final release of Grail 0.3.
Grail supports plug-in modules for a number of categories. Without going into a lot of detail (see the Grail plug-in documentation), we will at least mention the main differences between plug-ins and applets.
A plug-in module is explicitly installed by the user (unless it comes pre-installed with Grail, such as the standard protocols and preference panels). Since the user has to take explicit action to install a plug-in, Grail expects plug-in modules to be well-behaved and trusted, and does not restrict their access to system resources. In other words, a plug-in module has the same access rights to the file system, network and other system resources as Grail's mainline code (which in turn is limited only by the user's own access rights).
Applets, on the other hand, are loaded implicitly as part of an arbitrary web page that the user is browsing. Therefore, applets cannot be trusted a priori. Consequently, Grail restricts their access rights in order to prevent applets from overwriting or stealing the user's data. See the next major section.
When the HTML parser encounters a valid applet reference, it sets the wheels in motion for the applet's invocation. It hands the applet's attributes and parameters received from the parser to a newly created AppletLoader instance, which is responsible for the actual loading of the applet.
Grail currently supports three alternative syntax forms for applet loading: the original <APP> tag borrowed from Java, the <APPLET> tag also borrowed from Java (which was introduced because the way <APP> was used was illegal in SGML), and the <OBJECT> tag, specified in the W3C working draft Inserting objects into HTML. (The latter has the advantage that it is possible to create pages containing both a Java and a Python version of an applet.)
When the AppletLoader has received all information from the parser, it determines whether it is feasible to try and create the applet. If this is not the case (e.g. the URL doesn't look like it points to a Python module), it tells the parser to render the alternative contents of the <APPLET> or <OBJECT> container instead.
When the AppletLoader decides to create the applet, it first creates a placeholder frame in the document's text stream which will be the applet's display area. It then creates a BaseReader instance to load the module file asynchronously. Once the module file has been loaded, it creates a ``restricted execution environment'' in which the applet will be executed.
The AppletLoader tries to work asynchronously, within its limitations. Since Grail (currently) doesn't use threads, the Python interpreter has to be invoked synchronously, and Grail is at the mercy of the applet's code to maintain its responsiveness. Normally, this is not a problem, since applets are generally written as a collection of Tk callback routines invoked by input events such as mouse clicks or keystrokes. However, a malicious applet could monopolize the CPU. The user's only resort in this case is to use means outside of Grail to kill the Grail process.
Another situation in which applets fail to work asynchronously is when an applet imports one or more other modules from a remote host (usually but not necessarily the same host from which the applet was loaded). When the Python interpreter encounters the import statement, it expects to be reading the module source file synchronously.
The only way around these problems would be to use threads. Currently, requiring threads for Grail would be a limitation to its portability -- not all Unix platforms support Python threads yet, and for Windows and Mac the situation is worse. There is also the issue of Tcl/Tk's thread safety -- probably disasters will happen if two threads attempt to access Tcl/Tk simultaneously.
A future version of Python (probably 2.0) will have a rewritten interpreter main loop supporting threads in a portable fashion, using coroutine-style thread switching at the C level (which will resemble pre-emptive switching at the Python level). This will allow us to write an asynchronous I/O library and contain applets in their own thread(s).
As explained earlier, applets must be prevented to clobber or steal the user's data without restricting their useful functionality too much. An additional threat is resource overconsumption or denial of service. For instance, this could take the form of an applet soaking up all CPU time or memory on the user's machine, or creating an unending stream of dialog boxes covering the screen and requiring the user to click before they go away. This could require the user to reboot the machine (especially if the user is not particularly knowledgeable about how to get rid of a runaway process).
Grail currently doesn't address the latter issues. A solid approach will have to be based on limiting resource usage in the Python interpreter, which is hard to enforce in the current interpreter. Something should be done about creating too many dialog boxes in the Tkinter.py module though, and monitoring disk space usage by applets is a definite possibility. A cap on CPU usage could be implemented, at least on Unix, by combining the alarm() system call with an uncatchable exception. Capping memory usage seems to be hard, even if we could add a hook to the memory allocator, as it isn't always clear on whose behalf a piece of memory is allocated.
Grail does, however, have an extensive architecture in place to prevent applets from clobbering or stealing user data. This builds on Python's restricted execution mode. It's important to distinguish mechanism from policy: restricted execution mode provides the mechanism for restricting access to resources; Grail implements a particular policy.
Restricted execution mode is enabled on a per-module basis. It is a standard Python feature since version 1.2, enabled by the rexec.py module and supported by a very small amount of mechanism in the interpreter proper: mainly, some backdoor accesses to object internals are disabled. When a module (e.g. an applet) is executed in restricted mode, the unrestricted code (e.g. Grail) controls the implementation of all built-in functions invoked by the restricted module. This includes the importing of other modules by the restricted module -- these will generally also be run in restricted mode. Restricted modules can call functions defined in unrestricted modules, and these functions will run in unrestricted mode -- but only if the unrestricted code has explicitly given the function to the restricted code (e.g. by passing it in through its dictionary of ``built-in'' functions).
There can be any number of entirely separate restricted execution environments. This makes it possible to maintain different trust levels for different applets (or groups thereof), e.g. allowing one applet to access local files but not the network, while allowine another applet to access the network but not local files, and disallowing either applet any access to the global variables and modules of the other.
All access to file system and network resources from Python are accessed through built-in functions or modules, so it is easy to take away these accesses completely. However, this would be too drastic: there are many situations where applets should be allowed some access to the network and/or to the local file system. Therefore, restricted execution mode makes it possible to implement subtler restrictions by replacing instead of deleting built-in functions or modules. As an example, it's easy to replace the built-in open() function with one that allows access to pathnames only if they exist in a particular directory, or one that allows only read access, or any combination of these two.
A peculiar piece of mechanism is a module called Bastion.py. This makes it possible to pass Python class instances into restricted mode in addition to functions and methods, which are restricted mode's first line of defense. In restricted mode, the only thing you can do with a function is to call it. By default, there are no restrictions to the access to class instances, so restricted code could manipulate the instance and class variables. A Bastion gives the restricted code a dummy object with which it can mess as much as it likes -- its instance variables are initialized to (a subset of) the methods of the real object, which itself is inaccessible to the restricted code. This module will be released as part of Python 1.4.
Of course, the restrictions on a module apply recursively to any Python modules it imports. Therefore there is little point in restricting the importing of modules written in Python -- their functionality can always be duplicated by including their source code directly. The real danger lies in modules written in C -- these are disallowed by default and only modules whose names are explicitly listed (like modules math and strop, which are really harmless) can be imported.
Grail currently implements a fairly simple set of security rules.
First of all, the user has a say in the matter. There's a preference panel through which the user controls whether applets are loaded or not. There are three choices: load no applets; only load applets from a list of trusted hosts or domains; or load all applets. Of course, in general, security on the Internet is such that trusted hosts can only really be trusted if there's a secure path to it. Since Grail currently doesn't support secure versions of TCP/IP, this would only be guaranteed for configurations involving a firewall.
Normally, a separate restricted execution environment is created for each host from which applets are loaded. This means that a malicious applet can't access information that the user has passed to other applets. For situations where applets need to cooperate, the user can specify groups of hosts (currently based on domain names) whose applets execute in the same restricted environment.
A detail: strictly speaking, applet grouping and trust is not based on the host from which the applet is loaded, but on the host from which the page containing (or referencing, if you will) the applet is loaded. These are often, but not necessarily, the same host. The idea is that if you trust a host enough to load applets from it, you should trust the applets that it passes on to you from elsewhere; and also that otherwise an untrusted page might play a trick on you by referencing a trusted applet but passing it parameters which might somehow fool the applet into doing something bad (e.g. read data from an untrusted URL).
Within its restricted environment, applets currently have access to the following resources:
Clearly, this policy allows applets to steal data from the user. It does not allow applets to clobber user data, however, so it provides a first line of protection.
A future policy should contain measures to prevent applets from stealing data. Some ideas:
A workable approach will likely combine a number of these proposals.
Grail uses Tk for all its GUI needs, through a modified version of the Tkinter.py module. (This version will be distributed with the next Python release, so there's no need to fear divergence.) The choice for Tk was made initially made on purely pragmatic grounds: it was the only GUI toolkit for Python that was immediately available and rich enough to be able to display HTML. This is probably still the case (WPY's text widget is no match for Tk's, and wxPython would likely require much more programming in order to implement HTML display), but there are clearly drawbacks.
The functionality of the Tk text widget is okay for displaying basic HTML 2.0, but it lacks the flexibility required for advanced features like image alignment, background images, text flowing around images, and tables. There are also some annoying bugs that require complicated work-arounds (scrolling of pages containing large in-line images or tables being the worst offender, followed by the inability to set formatting properties for lines whose first element is an in-line image or table). Tk 4.1 doesn't seem to address these issues, except that the "grid" geometry manager makes laying out tables simpler.
The best we can say about this is that the Tcl/Tk development group at Sun has written a web browser in Tcl, and they are faced with the same problems -- perhaps this encourages them to fix the bugs and add the required functionality.
Tk's performance is another problem. The current version still leaks memory at an alarming rate, and, much to our dismay, even though Grail's HTML parser is written in Python, more time is spent rendering text than parsing it! We are currently tracking down where the time is spent. Preliminary results show that there are inefficiencies all over the place: the Tkinter.py module (which provides the object-oriented view of Tk for Python programmers) imposes quite a bit of overhead in some cases, the C-level interface between Python and Tcl/Tk (tkintermodule.c) has the overhead of constructing a Tcl command from a list of strings which is immediately parsed by Tcl into an identical list of strings, some operations of the Tk text widget are slow, and last but not least, Grail over-uses certain Tk features.
While this paints a rather grim image of performance, at least there is room for much improvement. For instance, we plan to streamline Tkinter.py, re-implement some of its slower operations in C, and trim some of Grail's most excessive over-use. We also have a preliminary version of a hack that eliminates the Tcl parsing overhead. Each of these improvements probably won't make a huge difference, but if we add them all up, we may be able to make considerable inroads.
Other areas where performance can be improved have nothing to do with Tk: the performance of Python itself can be improved (see another presentation) and the HTML parser has been rewritten in C, starting with an SGML lexer by Dan Connolly. (Caution about the legality of using Dan's code has prevented us to include this in the beta 2 release; this will be worked out.)
Finally, we suspect that the performance of Tk's text widget could be a problem. We have found that creating and configuring a text widget takes a long time (although so far we're not 100% sure that this isn't because of the large number of tag configuration and event bindings that we need to apply to each widget). We have also noticed that using Rivet (a Tcl-free interface to Tk, written by Brian Warkentine) doesn't improve Grail's performance much. This would indicate that eliminating the Tcl interpreter (which is Rivet's main virtue) doesn't save much, meaning that the overhead is either in Tk, or in Grail or Python. This version of Grail doesn't interface directly to Rivet. Instead, it uses a modified version of Tkinter.py (cutely named Trinket.py, written by David Ascher), which incurs roughly the same overhead as the original Tkinter.py. Therefore, gauging Rivet's performance gain using this configuration is hard, but it is at least consistent with the theory that the text widget is slow.
I still believe that the choice for Tk was a good one. Tk has good support prospects, is being ported to Windows and Mac, and has a really simple, powerful programming model (no redraw events). Using the latest Tk ports, Grail can run on Windows and Mac, although its reliance on external Unix tools for certain functionality such as JPEG decoding and printing makes it limp rather than run, and the stability of the Tk ports (especially to the Mac) is not as good as that of the Unix port -- yet.
A side effect of the decision to use Tk for Grail has been a large number of improvements to the Tkinter.py module. Many small bugs have been fixed, and interfaces to many new commands added to Tk in version 4.0 or 4.1 have been added. Several new utility modules have also been developed, in particular a powerful file selection dialog module, and an improved simple dialog module. There's also a utility module with lots of "goodies" like a routine that creates a text widget with two scroll bars.