Skip to main content

Anticipatory Human-Computer Interaction

Periodic Reporting for period 2 - ANTICIPATE (Anticipatory Human-Computer Interaction)

Reporting period: 2020-08-01 to 2022-01-31

More than three decades of research on human-computer interaction (HCI) have resulted in significant advances in theories, tools, and methods to facilitate, support, and enhance interactions between humans and computing systems. HCI research has also paved the way for many highly important and commercially successful applications, such as touch input on the billions of mobile phones used worldwide, or latest game consoles and car infotainment systems that can be controlled by body movements or hand gestures. However, despite the fundamental importance of HCI for the information society as well as advances towards the grand HCI challenge of making interactions with computers human-like, current general-purpose user interfaces (UI) still fall a long way short in one core human ability – Theory of Mind (ToM). ToM allows us to attribute mental states to others and anticipate their actions and is thus essential for us to interact naturally, effortlessly, and seamlessly with each other.

Two critical requirements for ToM are the impressive human abilities to understand others' attention and to predict their intentions. Deficits in ToM are closely linked to severe developmental
disorders, such as autism. We argue that current general-purpose UIs are similarly mind-blind. That is, they fail to sense users' attention and predict their intentions and therefore lack the ability to anticipate and pro-actively adapt to users' actions. This limits them to operating after the fact, i.e. to merely reacting to user input, drastically restricting the naturalness, efficiency, and user experience of current interactions.

The overall objective of ANTICIPATE is to establish the scientific foundations for a new generation of user interfaces that implement ToM and are thus able to anticipate users' future actions and take action on users' behalf. To this end, the project explores fundamental computational methods to sense users' attention and predict their intentions during interactions with general-purpose graphical UIs (i.e. not specialised for a particular task, such as text entry), as well as innovative interaction paradigms for anticipatory UI adaptations. If successful, these UIs will appear near "magical" to users as they will seem to read their minds and to always be one step ahead. For example, these UIs could infer the information that users intend to find and thus only resides in their minds, and pro-actively present that information. By establishing anticipatory HCI as a strong complement to existing interaction and UI adaptation paradigms, we expect to significantly improve the naturalness, efficiency, and
user experience of current human-computer interactions. As such, the project has the potential to drastically improve the billions of interactions that we all perform with computers every day and to enable new types of UI adaptation impossible today.
Work in the first half of the project has focused on the key requirements for anticipatory HCI, i.e. everyday attention sensing and prediction of user intentions, as well as on forecasting interactive user behaviour. These works have been published at top venues in computer vision, human-computer interaction, and human vision.

In the context of attention sensing, we have been specifically interested in learning-based gaze estimation, i.e. the challenging task of estimating the point of gaze of a person from face images recorded using an off-the-shelf camera. This approach is particularly promising given that, in contrast to traditional methods, no special-purpose eye tracking equipment is required and given that cameras are readily integrated into an ever-increasing number of devices. The task is also highly challenging, however, given the significant variability in illumination conditions and facial appearance in everyday settings. Extending on our own pioneering work in this area, in this project we have first studied key challenges associated with unconstrained gaze estimation in depth as well as the impact that they have on gaze estimation performance. We have also proposed a new large-scale dataset to facilitate further development as well as the realistic evaluation of future gaze estimation methods. Based on these analyses and leveraging the new dataset, we have then proposed new gaze estimation methods that have significantly improved over the state of the art both in terms of performance as well as robustness to challenging images and evaluation settings. We have packaged and released these methods as an open source library (www.opengaze.org) to democratise the use of gaze estimation in HCI and beyond. We have also specifically studied the task of mobile gaze estimation, i.e. estimating gaze using cameras integrated into mobile devices. To this end, we have introduced another in-the-wild dataset and were, for the first time, able to quantify the highly dynamic nature of everyday visual attention across users, mobile applications, and usage contexts.

In the context of intention prediction, we have focused both on the general intention prediction task as well as the task of predicting visual search intents as a particularly promising use case, also beyond HCI. We have proposed the first-ever method to predict the categories as well as attributes of search intents from human gaze data. In a follow-up work, we have gone one step further and have demonstrated that the search intent cannot only be predicted but even visually reconstructed from gaze - similar to a photofit (facial composite) created during criminal investigations. Visually decoding images that only exist in peoples' minds (also known as mental image reconstruction) is profoundly challenging given that the information required to succeed in this task is encoded in complex neural dynamics in the brain and not easily accessible from the outside. Previous methods thus required sophisticated brain imaging setups for this task, such as functional magnetic resonance imaging. Our work is groundbreaking in that it offers an alternative that only requires an eye tracker and, as such, is much more practically useful. We have conducted a human study that showed that our method could decode photofits that are visually plausible and close to an observer's mental image. This line of work was complemented with an inter-disciplinary work conducted with colleagues from the Netherlands in which we demonstrated that intents can also be predicted from pupil dilations - i.e. the fluctuations that occur in pupil diameter over time as visual stimuli are visually inspected.

Finally, we have presented new methods for temporal forecasting of user behaviour, i.e. the task of predicting behaviour that is most likely going to happen in the near future. Behaviour forecasting is an important prerequisite for interactive systems to be able to adapt in an anticipatory way, i.e. before the user action has actually happened. In a first work we have introduced the first method to anticipate gaze behaviour in natural dyadic interactions. Our method analyses non-verbal facial cues and speaking behaviour and is capable of anticipating gaze for different future time horizons. We have empirically evaluated our method on a novel dataset of 121 YouTube videos of dyadic video conferences and have demonstrated that our method clearly outperforms baselines. In a second project, we have again investigated a visual search task in four different, immersive virtual reality environments. We have collected a comprehensive dataset and have proposed a novel learning-based method to forecast eye fixations in the near future for both free-viewing (without a visual task) and task-oriented interactive settings.
In the second half of the project, we will continue to advance the fundamental methods for attention sensing and intention prediction. As these methods mature, we will increasingly investigate applications of these methods in interactive systems as well as start developing interaction paradigms for anticipatory user interface adaptations. The overall goal is to complement and extend the methodological building blocks developed for attention and intention prediction into a full-fledged and practical methodology for designing anticipatory human-computer interfaces. To significantly lower the barrier for practitioners to implement anticipatory UIs, and to facilitate future research, we further plan to consolidate all paradigms in a software toolkit that we will continuously extend over the course of the project.