Periodic Reporting for period 4 - ANTICIPATE (Anticipatory Human-Computer Interaction)
Okres sprawozdawczy: 2023-08-01 do 2024-01-31
Two critical requirements for ToM are the impressive human abilities to understand others' attention and to predict their intentions. Deficits in ToM are closely linked to severe developmental disorders, such as autism. We argue that current general-purpose UIs are similarly mind-blind. That is, they fail to sense users' attention and predict their intentions and therefore lack the ability to anticipate and pro-actively adapt to users' actions. This limits them to operating after the fact, i.e. to merely reacting to user input, drastically restricting the naturalness, efficiency, and user experience of current interactions.
The overall objective of ANTICIPATE is to establish the scientific foundations for a new generation of user interfaces that implement ToM and are thus able to anticipate users' future actions and take action on users' behalf. To this end, the project explores fundamental computational methods to sense users' attention and predict their intentions during interactions with general-purpose graphical UIs (i.e. not specialised for a particular task, such as text entry), as well as innovative interaction paradigms for anticipatory UI adaptations. If successful, these UIs will appear near "magical" to users as they will seem to read their minds and to always be one step ahead. For example, these UIs could infer the information that users intend to find and thus only resides in their minds, and pro-actively present that information. By establishing anticipatory HCI as a strong complement to existing interaction and UI adaptation paradigms, we expect to significantly improve the naturalness, efficiency, and
user experience of current human-computer interactions. As such, the project has the potential to drastically improve the billions of interactions that we all perform with computers every day and to enable new types of UI adaptation impossible today.
In the context of attention sensing, we have been specifically interested in learning-based gaze estimation, i.e. the challenging task of estimating the point of gaze of a person from face images recorded using an off-the-shelf camera. The task is highly challenging, however, given the significant variability in illumination conditions and facial appearance in everyday settings. Extending on our own pioneering work in this area, in this project we have first studied key challenges associated with unconstrained gaze estimation in depth as well as the impact that they have on gaze estimation performance. We have also proposed a new large-scale dataset to facilitate further development as well as the realistic evaluation of future gaze estimation methods. Based on these analyses and leveraging the new dataset, we have then proposed new gaze estimation methods that have significantly improved over the state of the art both in terms of performance as well as robustness to challenging images and evaluation settings. We have packaged and released these methods as an open source library (www.opengaze.org) to democratise the use of gaze estimation in HCI and beyond. We have also specifically studied the task of mobile gaze estimation, i.e. estimating gaze using cameras integrated into mobile devices. To this end, we have introduced another in-the-wild dataset and were, for the first time, able to quantify the highly dynamic nature of everyday visual attention across users, mobile applications, and usage contexts.
In the context of intention prediction, we have focused both on the general intention prediction task as well as the task of predicting visual search intents as a particularly promising use case, also beyond HCI. We have proposed the first-ever method to predict the categories as well as attributes of search intents from human gaze data. In a follow-up work, we have gone one step further and have demonstrated that the search intent cannot only be predicted but even visually reconstructed from gaze - similar to a photofit (facial composite) created during criminal investigations. Visually decoding images that only exist in peoples' minds (also known as mental image reconstruction) is profoundly challenging given that the information required to succeed in this task is encoded in complex neural dynamics in the brain and not easily accessible from the outside. Our work is groundbreaking in that it offers an alternative that only requires an eye tracker and, as such, is much more practically useful. We have conducted a human study that showed that our method could decode photofits that are visually plausible and close to an observer's mental image. This line of work was complemented with an inter-disciplinary work conducted with colleagues from the Netherlands in which we demonstrated that intents can also be predicted from pupil dilations - i.e. the fluctuations that occur in pupil diameter over time as visual stimuli are visually inspected.
Finally, we have presented new methods for temporal forecasting of user behaviour, i.e. the task of predicting behaviour that is most likely going to happen in the near future. Behaviour forecasting is an important prerequisite for interactive systems to be able to adapt in an anticipatory way, i.e. before the user action has actually happened. In a first work we have introduced the first method to anticipate gaze behaviour in natural dyadic interactions. Our method analyses non-verbal facial cues and speaking behaviour and is capable of anticipating gaze for different future time horizons. We have empirically evaluated our method on a novel dataset of 121 YouTube videos of dyadic video conferences and have demonstrated that our method clearly outperforms baselines. In a second project, we have again investigated a visual search task in four different, immersive virtual reality environments. We have collected a comprehensive dataset and have proposed a novel learning-based method to forecast eye fixations in the near future for both free-viewing (without a visual task) and task-oriented interactive settings.
 
           
        