Withindows  WITHINDOWS
A Unified Framework for the Development of Desktop and Immersive User Interfaces

Withindows is a theoretical framework for developing user interfaces that can operate in both dekstop and full 3D immersion without redevelopment. It simultaneously addresses problems with usability, efficiency, fatigue and fexibility associated with the development of immersive applications. The framework not only provides a route for existing desktop applications to make the transition into 3D but also optimizes several tasks commonly undertaken in virtual environments.

The framework is based on the novel combination of two existing techniques; image-plane selection and through-the-lens.

Image-plane Selection Image-plane Underhand
Image-plane, or occlusion selection, involves occluding the selection object with the hand or a virtual cursor attatched to it. Traditional 2D user interface elements can be used for long periods with low fatigue when placed on a window positioned below the hand.
Through the Lens Image-plane on TTL
Through-the-lens techniques a) place a virtual camera at the eye to create a view within the window. b) Objects can be grabbed and manipulated by c) reaching into the resulting view frustum. Image-plane selection on TTL windows facilitates 2½D constrained object manipulations without reaching. a) Spatial movements of the virtual cursor translate into b) constrained manipulations at a distance.


Ygdrasil IDE - A proof of concept graphical interface for the Ygdrasil VR development platform.
A series of design sketches illustrating a number of interaction techniques.
A paper on the framework presented at the 3DUI Symposium at VR 2008.
The full dissertation document.


There are a number of problems common to 3D user interfaces:

limited flexibility
Applications and their interfaces must be developed separately for desktop and 3D environments.
high fatigue
Users are routinely asked to stand, bend, sway or reach for objects placed out in front of them.
low usability
Users frequently spend their limited time having to learn a new interface modality.
low efficiency
Simple 3D search and object manipulation tasks are generally less efficient in 3D.

The Withindows framework addresses each of these problems:

high flexibility
Because they encapsulate primary virtual environment tasks of global search, object selection and object manipulation, through-the-lens windows can be presented unaltered on the desktop and behave as traditional 3D applications with no redevelopment efffort
low fatigue
Through-the-lens windows can be positioned directly below the hand in 3D environments in order to reduce fatigue and allow longer term work to take place.
high usability
Users can be exposed to and learn the immersive interface on the desktop and then concentrate on aspects of the application unique to 3D when using the system in an immersive setting. Exposure to the 3D interface on the desktop is also likely to provide resiliency to the frequent low-resolution display conditions associated with 3D display technologies .
high efficiency
In the absence on true haptic feedback, constrained 2D manipulations from optimal alternate viewpoints overcome inefficiencies associated with methods that rely solely on first-person perspective.

The Withindows framework has a number of additional features including:

3D Viewpoint Management
The framework details a number of ways that viewpoints within TTL windows are managed. The default mechanism is the familiar object-centered click-and-drag zoom, rotate and pan functions over the window. More intuitive methods involve a synchronous zoom and rotate mapped to the hand and manipulations of the window (figure at right). The framework also proposes the addition of window icons to register the secondary scene to the surrounding scene and to teleport the user to the viewpoint within the viewing window.
Proposed Window Icons
Relationship between Window and Secondary Scene
Normally the secondary scene a) is locked with respect to the viewing window. Unlocking the secondary scene b) allows window manipulation to change the viewpoint into the secondary scene.

3D Object Manipulation
While the default 2½D object manipulation provides a smooth transition from familiar desktop applications, 3D input devices can also be used without superceding those interaction methods. The image-plane grab technique is spatially isomorphic and prevents unecessary bending and reaching because it uses a depth insensitive selection technique. Image-plane grab allows full orientation, keeps the object under the virtual cursor at all times and adjusts the object depth relative to hand depth (figure at right). In contrast to previous through-the-lens methods, image-plane grab allows the user to move and position objects of all sizes between the surroundings and viewing windows by reseting the initial hand and object depths when moving outside the window frustum.
Iamge-plane Grab Within TTL Windows
The ratio between hand depth and object depth is defined by the ratio between initial object and hand depth. Because manipulations through a TTL window use the virtual camera position, zooming out twice the distance doubles the depth range over which the object can be moved.

Image-plane Selection in Stereo
A potential problem with using image-plane selection in stereo environments is the ambiguity that arrises when the finger, hand or virtual cursor is closer than the selection referent. A dominant-eye cursor resolves this ambiguity and makes it possible to accurately select content in a stereo environment (figure at right). This formulation also resolves control-display non-linearities that arise from simply assigning object depth to the cursor in desktop stereo configurations.
Non-linear behavior of a Simple Depth Cursor
Dominant Eye and Reinforcing Cursor

An ambiguous selection condition in stereo a) can be resolved by presenting the virtual cursor exclusively in the dominant eye. Once over a valid selection b) a reinforcing cursor can then be introduced into the non-dominant eye to give the virtual cursor the appearent depth of that object.

Transitional Configurations
Encapsualting common virtual environment tasks within TTL windows allows those windows to be used on the desktop without modification. On the other hand, using image-plane selection on 2D windows in 3D allows desktop applications to migrate easily to 3D. Moreover, there are a number of best-in-class techniques used on trasitional configurations between desktop and immersion that can be understood or even improved when formulated using image-plane techniques. A technique called perspective cursor uses head tracking and screen location to smoothly move the cursor across multiple displays. When developed using an image-plane approach it becomes trivial to do full registered image-plane or clutched virtual-mouse selection on either mono or stereo displays. Projector-based augmented reality typically uses a standard mouse or touch interaction at the surface. An image plane approach allows interactions at a distance without introducing additional complexity or cost (figure at right).
Image-plane Selection in Projection-based Augmented Reality
A cursor displayed on a surface can be a) projected onto a plane perpendicular to the ray cast between user viewpoint and that cursor. The cursor can then be manipulated within that plane b) by tracking the relative motion of the hand within the image of a camera located near the eye. This technique only requires a rough XYZ position of the user viewpoint and transitions smoothly into touch at the surface.

Advantages of Image-Plane Selection
Ray-casting, using the hand to point at objects, has been the predominant selection technique in virtual environments. The technique relies heavily on proprioceptive feedback and therefore requires 6 degree-of-freedom tracking and stereo display. However, image-plane selection is effectively a 2D technique from the viewing postition of the user and facilitates the addition or removal of stereo display, 3D mouse or head tracking to a desktop configuration. Image-plane selection is also potentially superior to ray-casting because it maintains a linear control-display relationship on surfaces presented at oblique angles, is insensitive to depth of presentation and transitions smoothly into and out of touch (figure at right).

Image-plane Transitions Smoothly to Touch

As the hand approaches a surface, a) image-plane selection transitions smoothly into touch selection on that surface. The selection point of a ray cast from the hand b) will move nonlinearly as it approaches the surface and may interfer with the ability to focus on a desired selection point.