Periodic Reporting for period 2 - GEMINI (Gaze and Eye Movement in Interaction)
Période du rapport: 2023-04-01 au 2024-09-30
Eye movement and gaze are central to human interaction with the world. Our visual system not only enables us to perceive the world, but also provides exquisite control of the movements we make in the world. The eyes are at the heart of this, never still, and in constant interaction with other parts of our body to direct visual attention, extract information from the world, and guide how we navigate and manipulate our environment. The movement of our eyes, in coordination with head and body, implicitly reflects our goals, interests and information needs, while we also have fast control over where we direct our gaze, to focus attention, communicate interest, establish shared attention, and express intent. This makes gaze and input from the eyes in principle a formidable modality for human-computer interaction (HCI) but state of the art interfaces are still marred by poor usability, control and expressivity.
Gaze has a been studied in HCI since the eighties but this has focussed narrowly on eye-tracking to detect fixations of the eye on objects of interest. To capture fixations, eye movement has been viewed in isolation, and other movement, in particular of the viewer’s head and body, has been treated as noise. The result are interfaces that constrain their users and require them to use their gaze unnaturally, with exaggerated eye movements while suppressing users’ natural tendency to move their head and body in support of gaze. In GEMINI, we are developing a completely new foundation for interaction design with gaze and eye movement, by considering how eye, head, hand and body movements work in concert.
We focus our research on three aspects of eye movement, to enable advances in 3D interaction:
Eye and Motion: the eyes are continually responding to external motion to stabilise our vision, for example when we are walking, looking out of a train window, or gazing at a moving object. These stabilising movements have been ignored in HCI, whereas we will study their utility for robust detection of attention to objects in 3D - a major problem for gaze-based interfaces as an eye-tracker can only provide an estimate of a gaze direction.
Eye and Head: we tend to think of ‘looking’ as something that we do with only our eyes. However, head movement is an integral part of our visual system, to support eye saccades and maintain a comfortable eye-in-head position. In eye-tracking applications, head movement has been treated as a problem rather than as part of gaze, and filtered out. In contrast, we will explore how input from eye and head can complement each other for hands-free interaction in 3D.
Eye and Hand: we naturally look at objects that we aim to manipulate. In HCI, gaze has therefore been viewed as alternative to manual input, where objects are selected by gaze and dwell time instead of mouse and click. This has advanced accessibility but exposed problems of Midas Touch, accuracy and expressiveness. In contrast, we will study how the natural coordination of eye and hand can be leveraged to develop more efficient input techniques for selection and manipulation of objects in 3D.
Eye and Motion:
We developed a novel interactive method to robustly detect visual attention to an object, based on vergence eye movement. When objects appear close to each other in the visual field then it is difficult to infer which one a user is focussing on, as their eyes exhibits natural jitter during a fixation and as eye-tracking has limited precision. This is a problem in particular in 3D environments where objects can appear close to each other along the line of sight, even when they are at a different distance in the environment. The method we developed is to move any object of interest back and forth along the line of sight, while at the same time scaling objects so that they appear unchanged in size. When the user is attending to an object as it moves closer to the eyes, the eyes respond with vergence eye movement, with the eyes rotating toward each other to maintain focus on the object, and vice versa the eyes are rotating outward when the object is moving further from the eyes into the environment. We developed an algorithm that is able to detect the correspondence between the back and forth movement of an object and the vergence of the eyes. In a first study, we showed that the method is robust against false activation as vergence in conjunction with a moving object is slower than the vergence behaviour that occurs when we switch attention from an object that is nearer to one that is further away (or vice versa). In a second study, we showed that the method can be used to support selection from among possible target objects even when the objects are very small and in close proximity, based on having them move back and forth in different patterns (in the simplest case, one moving closer while the other is moving farther). This is an exciting result as it enables detection of attention to objects even when eye-tracking is inaccurate, as the correct match is established by the relative movement of the eyes.
Eye and Head:
Head movement naturally contributes to gaze and is often used to approximate where a user is looking, for example to adjust the content shown on a display to the direction in which the user is facing. However head movement can also be used independently, for example to perform a gesture or to control the movement of a cursor. We have conducted a series of work on the combination of eye and head movement for precise pointing on the interfaces. Gaze is well-known to be fast for pointing at objects but lacking accuracy, while head movement provides excellent control for refinement of a cursor. The challenge in combining the modalities is that the head naturally contributes also to the initial gaze shift to a target. The head movement associated with the gaze shift is preprogrammed with the eye saccade and still in progress when the eyes have already reached the target. It can therefore not be used immediately for refinement of the gaze cursor, as this would lead to overshooting. To address this problem, we have proposed use of machine learning for classification of head movement into Head-Gaze versus Head-Gestures. In a first study, we found that we were able to classify the two types of movement with high accuracy. In follow-on work, we developed a technique based on the classifier, to automatically switch cursor control from gaze to head for refinement and improve hands-free pointing performance. The insight that not all head movement is the same also has wider implications for HCI, as the classification can enable use of head gestures while avoiding unintentional input by Head-Gaze, and vice versa use of Head-Gaze to naturally switch attention (for example between different windows) while avoiding Midas Touch by Head-Gestures.
In other work on eye and head input in 3D, we have studied how gaze can be leveraged for viewport control in head-mounted displays (HMDs). HMDs function like a window to a virtual world that is fixed to the head, allowing users to explore their surroundings by turning their head and body accordingly. We investigated gaze as an alternative modality for viewport control, to enable users to explore virtual environments with less effort. We designed three techniques that leverage gaze based on different eye movements: a gaze dwell technique for viewport rotation in discrete steps, a gaze gain for amplified viewport rotation based on gaze angle, and a gaze pursuit technique for smooth rotation of the viewport to align objects of interest centrally. In a comparison with manual viewport control, we found our gaze-based techniques to be as effective while provide novel and diverse ways in which surround viewing can be supported with much reduced effort.
Eye and Hand:
Whenever we use our hands for input, our eyes are implicitly involved in the action, typically by looking ahead toward the target of the manual input. We developed Gaze-Hand Alignment as a novel concept to take advantage of eye-hand coordination for input in 3D. We treat both gaze and hand input as pointers into the 3D environment, and alignment of one pointer with the other as a distinct event. In a series of studies, we have developed and evaluated a range of specific techniques that are based on alignment, for example for perspective pointing in 3D with a finger raised into the line of sight. From a user's perspective, the focus is on the manual action, while gaze is leveraged implicitly. The technique takes advantage of gaze for target pre-selection, as it naturally precedes manual input, and selection is then completed when manual input aligns with gaze on the target, without need for an additional click method. In a comparative evaluation, we found alignment techniques to outperform corresponding techniques that are hands-only without gaze support.
We will continue to work on all of our major strands of research. Eye and hand interaction is an area in which we are beginning to see first breakthroughs towards adoption of research results into user interface platforms but the work we are conducting on this topic is also leading to new fundamental questions that we had not originally considered, such as effects of eye dominance on interaction in 3D. In work on eye and head coordination, we will be further exploring ergonomic aspects toward design of techniques that reduce effort and increase viewing comfort. This is work is of particular relevance to improve the usability of head-mounted displays. We will also continue to explore involuntary eye movements such as vergence, to develop their potential for interaction.