Application Framework Design
June 13 2022
A new graphical application that aims to provide a unique interaction deserves a new application framework designed for it!
This new framework needs to support new kinds of user interaction idioms that are unique to the multiuser 3D environment. And more notably, it needs to support live programming in a consistent manner.
One of the major components of a user interface framework is how the user event is “routed” and how the right event handler is invoked. Event routing is so that when a user uses their mouse or hits a key on their keyboard, the event routing mechanism selects the right object and invokes the right event based on the location of the event and some kind of focus policy. On the surface, it may sound trivial, but in practice it has a lot of intricacies, as it must be designed together with all other parts of the framework. In this blog entry, I will discuss the ideas behind the Microverse Framework.
As a bit of computing history buff myself, I cannot help but explain how our design draws upon ideas from the past. So let me present a very (biased) history of user interface frameworks.
The first application framework I'd like to mention is Smalltalk MVC[1]. MVC stands for Model-View-Controller, and nowadays it is often used to refer to various design patterns in a generic manner. However, in this context, I use it to refer to the specific application framework. The main idea of Smalltalk MVC was that a visual element of an application should be separated into those three parts, and each of them handles a well-defined responsibility. When the user pushes the mouse button, the system chooses the right controller based on the location of the mouse, and the controller invokes a method on the model to cause some state changes, and the view updates the its screen area based on the state change.
The MVC design done in the late 70's got some inspiration from the way 3D graphics would render objects. A "camera" (or cameras), or the view, is a separate entity from the "object", or the model, to be rendered. When a model is rendered, its properties and the camera's properties are combined to create the final image.
Trygve Reenskaug and Smalltalk pioneers created the MVC by taking various ideas like this and abstracting them into something that can be used to write generic 2D graphical applications. Incidentally, we can say that the the Model-View separation of the Croquet OS architecture is a descendant of this idea.
I myself did not encounter Smalltalk until much later. Around 1995, when I was a naive college student, Java hit the market with much fanfare. Naturally, I thought Java was a great invention. Java 1.0 came with an application framework called AWT (Abstract Window Toolkit)[2], and I thought that that was the modern way to make graphical applications. But AWT had a lot to improve, and the Java people must had had a very short deadline to deliver the first version.
What were the problems in AWT? AWT forced the developer to make a subclass of the system-provided base class (such as Button) and to override basic methods like “paint()” and “action()” to make a customized widget for your application. It means that the developer needs to have both the visual appearances and behavior customizations in one class. If you wanted to make a square button and a circle button for two different actions at different places, you would end up with making four subclasses that inevitably have duplicated code.
The next versions of AWT allowed the developer to add "action listeners" for events, and soon after that they introduced a new framework called “Swing” to rectify this inflexibility and other problems. They still required the developer to create classes with an actionlistener interface, but at least the logical action and visual appearances could be separated, and the developer could turn on and off the listeners at runtime. So I was learning here that that application framework design had a large design space.
Now, let us take a look at a very influential framework called Morphic[3]. Morphic was originally created for Self, a prototype-based object-oriented programming system. The main idea in Self was that an object could have any number of instance-specific methods and properties (those were actually the same thing), which could be added and removed dynamically. This allowed the user of the Self system to explore the design space with maximum flexibility and create an application interactively.
I first learned the version of Morphic that was ported to Squeak Smalltalk[4]. Squeak was a class-based object-oriented system, and the Morphic concepts were adapted to it. When I learned about the origin of Morphic in Self, I realized that the Squeak version was a bit more rigid than the Self version, and that the Squeak version could have implemented differently to have more flexibility (and we certainly experimented quite a bit). Looking back, I learned about the intricate relationship of languages and the application frameworks from this experience.
The most powerful idea of Morphic is being able to construct the entire system with only one type of objects (called Morphs). There are some features that might typically be treated as external primitives, but Morphic has Morphs that represent those features in the system. For example, the "HandMorph" represents an input device and synthesizes the event object based on user inputs, and the "WorldMorph" represents the root of the display scene as well as the display device itself. The user-level methods of those Morphs implement the event routing mechanism. This means that the uniformly programmed system allows the user to enhance the system from within. One example was supporting multiple users. If one wants to create a networked collaborative version of Morphic, they can add multiple HandMorphs to the system and connect them to other users over the network. it was not too different from adding a few more simple Morphs.
Of course, other language systems had application frameworks with notable ideas by the time Java debuted with their very uninspiring framework. I have heard quite unkind words uttered toward Java at computer science conferences for ignoring the good ideas out there.
Around that time, JavaScript was created by taking ideas from Self and a language called Scheme with Java-like syntax for marketing reasons[5]. Its graphical application framework, later formalized as the Document Object Model, surprisingly got many things right. In particular, instance-specific object customization, owning to the prototype-based language,was a big winner.
Typically in a graphical application, you need only one instance for each kind of element. You could still inherit from other objects if you need to make more, but you can add or remove event listeners and customize other aspects of a single object.
One more idea I'd like to mention is the "first responder" idea introduced NeXTstep (which was later used in Apple AppKit)[6]. This mechanism gives the user programs ways to control the event routing. It is typically used to capture the mouse pointer or handling keyboard focus, but it can also be used in more creative ways.
The Microverse application framework is basically the combination of all the good ideas mentioned above. The avatar is a kind of graphical object (card) and plays the role of HandMorph and WorldMorph (although we didn’t get around to making the camera into a card) . When a new user event is generated, the Microverse event routing mechanism first looks for the event’s first responder. The state of the event, set by the monitor keys, is used to determine whether the avatar (or some another object) should be the first responder. For example, a ctrl-pointerDown event may be sent to the avatar, if the avatar is the first responder for an event with the control key pressed. Then the avatar starts the edit-control action on a card that is determined by performing a raycast test on that event location. If there is no first responder (like for an event without any modifier keys), the event is sent to the object with the event listener for the type of the event.
The major win here is that the edit-control mechanism can be implemented in the same manner as event listeners can for a normal card, that edit control is togglable if necessary. (Again, however, from a purist’s point of view, we did not get around making the event routing mechanism itself into a behavior.)
In the mouse pointer-based control scheme, when the mouse button is pressed down on a card, a user card may need to capture the pointer until pointerUp occurs. Upon receiving the first pointer down, the card can make itself into the first responder for subsequence pointerMove events until pointerUp occurs. In this way, the card can keep handling pointerMove events even if the on-screen position goes outside of the rendered area of the card.
An interesting problem for us is how we can support an incomplete program that is being developed interactively by users. The card that captures the mouse pointer by becoming the first responder is supposed to relinquish the first responder status upon pointerUp, but the user program may forget to do so (as I often do). We want something (in this case the avatar) to handle the pointer up anyway and remove the first responder added by the user card.
How does the avatar even handle the pointer up? We added the notion of a "last responder" for an event, and for certain kinds of events such as pointerUp and keyUp, we add the avatar as the last responder. Those event listeners at the avatar clear left-over temporary first responders.
A user may want to remove or replace the default actions of the avatar, and that is perfectly doable. An avatar is just like a card, so its event listeners or first responder registration can be changed dynamically.
I wrote about a lot of 2D frameworks above, but aren't we dealing with 3D? From my perspective, all those ideas from 2D frameworks work very well in the 3D environment, especially when people use a conventional display and input devices. Substituting the 2D hit detection with raycasting was pretty much it.
Of course, we anticipate that we will have richer display technologies and input devices for Microverse. Your hands, feet and body will become your input devices. Even in that kind of environment, it seems to me that the flexible architecture we designed for Microverse seems to be adaptable without changing the core architecture. After all, some members in our group have done virtual reality and 3D frameworks in the past. And good news is that we have written the entire thing by ourselves from scratch. I am confident that we can modify it in any way we like when it comes time to support those displays and inputs.
- [1] The Model-View-Controller (MVC): Its Past and Present, Trygve Reenskaug, 2003
- [2]The Java Application Programming Interface: Window toolkit and applets, James Gosling, Frank Yellin, Java Team, 1996
- [3] Directness and Liveness in the Morphic User Interface Construction Environment, John Maloney and Randy Smith, 1995
- [4] An Introduction to Morphic: The Squeak User Interface Framework, John Maloney, 2002
- [5] Document Object Model (DOM) Level 2 Core Specification, W3C, 2000
- [6] NeXTstep Reference, 1990
Interested in trying out the Croquet Microverse?
Join Microverse Beta Read Docs