Skip to main content

Unravelling perspective-taking: Examining the influence of multiple constraints on perspective selection and multimodal behavior through a dynamical systems' approach

Periodic Reporting for period 1 - Dynamic Perspectives (Unravelling perspective-taking: Examining the influence of multiple constraints on perspective selection and multimodal behavior through a dynamical systems' approach)

Reporting period: 2016-08-22 to 2018-08-21

People coordinate with one another routinely throughout daily life—from moving a couch in tandem, to planning a trip jointly, to putting together a collaborative grant. Many of these tasks involve considering the perspectives of others, including tracking where our task partners are in space, what they know, or what we have already discussed. Despite advances in our understanding of how people act jointly, the mental underpinnings of the fundamental social skill of perspective-taking remain underexplored.

The objectives of this project have been to investigate:
i. the time-course of perspective-taking
ii. the integration of different sources of information during perspective-taking
iii. the stabilization of task partners on a perspective choice

This project adopted a dynamical systems approach to examine the joint contribution of multiple cues (social and environmental) to perspective-taking, by conceptualizing them as weighted information evolving over time. By parameterizing cues that may constrain perspective choice (e.g. the orientation of task objects relative to social perspectives) or interpersonal coordination (e.g. task type), the project contributes to theoretical models of dialogue and joint action. By modelling perspective-taking and identifying behavioural signatures of successful coordination, the project can inform the protocols used in applied settings such as health care, aviation, and more.
One of the central debates in the study of dialogue and joint action concerns the time-course of perspective taking: how quickly can language users take into account their task partner’s perspective. To address this debate, we used mouse-tracking as it permits sampling language users’ behavior on a fine-grained temporal scale. Across a series of studies, we examined the dynamics of language users’ mouse-trajectories when responding to linguistic instructions from a simulated partner. We manipulated social cues (the partner’s perspective) and environmental cues (about the objects’ orientation or configuration) to investigate whether and when language users integrate different sources of information. Specifically, we asked whether the convergence of these cues makes users (a) more likely to adopt that other-centric perspective, and (b) facilitates other-centric responding as indicated by the language users’ mouse-movements.

Consider a listener hearing the instruction “Give me the folder on the right” in an ambiguous visual context, where the utterance can be interpreted either egocentrically or other-centrically (selecting the folder on their right or the speaker's right, respectively). Would the listener be more likely to choose the other-centric folder when the orientation of the folders is aligned with the partner’s depicted perspective in the task environment? Moreover, would her response to reach the target folder be facilitated when the folders’ orientation is other-aligned vs. ego-aligned?

We found that introducing an other-aligned directional or configural cue to the task resulted in increased other-centrism. However, despite this shift in perspective preference, the other-centric responding was not facilitated in these contexts. Instead, across contexts, egocentric responders made faster and more direct mouse-trajectories to the target folders compared to other-centric responders.

We also conducted interactive studies, which “scaled up” the scope of the project to more naturalistic settings. In these interactive studies we examined how interpersonal coordination (e.g. in language use and body movement) is influenced by high-level task constraints (e.g. the partners’ relative body orientation) and whether it is predictive of task success.

In a direction-giving task, pairs interacted in two conditions: for one route description, direction givers (DGs) and direction followers (DFs) sat side-by-side (aligned), and for another they sat opposite one another (counter-aligned ). After each description, DFs drew the route on a map. When counter-aligned (vs. aligned), DGs produced more frequently expressions from a survey perspective (e.g. using terms such as east-west), and DFs use more words per conversational turn. Although accuracy of DFs’ drawings did not depend on the partners’ alignment, on spatial language use, or on individual spatial ability, we are currently examining whether coordination in body movement is predictive of task accuracy. To do so, we leverage automated video processing (frame-differencing methods) and quantify interpersonal coordination through time series analysis (cross-recurrence quantification analysis). This work can inform how task partners recruit different modalities (speech vs. body movement) in a complementary fashion to compensate for the difficulty of coordinating under different task constraints.

Finally, we have developed a computational model of perspective taking. This dynamical model extends a bi-stable attractor model (with the “egocentric” and “other-centric” perspectives serving as attractor well) to account for the interaction of both “slow” and “fast” processes in perspective-taking.

The findings of this project have been presented at scientific meetings (Annual Meeting of the Psychonomic Society 2017, 2018; Conference of the European Society for Cognitive Psychology, 2017), workshops, speaker series, and departmental colloquia (CogNetwork Lecture Series, UC Berkeley; Workshop on Dynamics of Language, UC Santa Barbara, UNC Chapel Hill).

Three articles have already been published in the Journal of Memory and Language (DOI: 10.1016/j.jml.2018.08.007) in Language, Cognition, and Neuroscience (DOI: 10.1080/23273798.2017.1384029) and in Frontiers in Psychology (DOI: 10.3389/fpsyg.2018.01278) and one additional manuscript is currently available as a preprint (

A total of 5 completed projects associated with this action have public repositories on OSF or GitHub.
The findings of the project can inform the design of a number of technological applications that require tracking and updating the users’ perspectives. Technical developments stemming from the findings of this project could be commercialized downstream to improve spoken dialogue systems, embodied agents, GPS navigational devices, navigational tools, and educational materials. This can also inform protocols used in the field¬–in healthcare, traffic control, battlefield communication, search and rescue, and more. In these settings, where perspective-taking errors can have grave consequences, developing automatic routines to identify multimodal signatures that are diagnostic of miscommunication would be especially valuable.

Anonymized and deidentified data for all completed projects, along with open source code in R for data analysis or modeling, are available on open repositories on OSF or GitHub to facilitate data access and reuse.