Final Report Summary - AIDA (Assessments of InDividual behaviour Analysed critically)

Assessments of individuals’ behaviour and personality play important roles in research and organisations. For efficient and standardised assessments, questionnaire scales are widely used. But the ways in which people actually understand these tools and by which they generate their answers to standardised scales were still poorly understood. There was also still little description of how well people’s verbal reports actually represent observable individual behaviours and the ways in which they may reflect social beliefs and stereotypes.

The AIDA project explored how people assess and socially categorise individuals and critically analysed standardised questionnaires widely used for assessments. A focus was on potential biases derived from stereotypes about gender (Woman and Man) and ethnicity, focussing on “Black” and “White” as prototypical categories of ethnicities that are at the centre of many social conflicts worldwide.

The formation of personality impressions has so far largely been studied in terms of agreement between different raters who were shown photos or behavioural sequences of people. This project went a step further. A novel transdisciplinary paradigm, cutting-edge interview techniques involving first-person videos, as well as quantitative and qualitative methodologies were used to systematically deconstruct the requirements that standardised assessment tasks impose on respondents and to reconstruct the social knowledge and the psychical processes involved therein.

A short-version of a popular Big Five Personality Inventory (BFI-10) was explored in two assessment-relevant contexts, in online surveys with photos (Online Study) and in interpersonal discourses about videotaped behaviours (Video Study). The target person to be assessed was either a Black or White Female or Male person (ethnicity x gender in 2 x 2 design). The same target persons were used for the two complementary studies.

In the *Online study*, participants were asked to first rate an unknown person in photo on the personality scales and then to explain what they have considered in their judgements and how they understand the item. Every participant saw just one of the four target persons. The sample comprised N = 120 survey participants; their open-ended responses to the various questions about each single personality item produced a substantial corpus of 64,901 words.

In the *Video Study*, participants came to the lab with a friend, colleague or acquaintance where they jointly watched a short film showing three scenes with diverse leadership behaviours displayed with the same intensity by the four different protagonists. Every interview pair saw just one film, thus only one of the four target persons (e.g. only the White Female protagonist). Participants were asked to rate the protagonist on the questionnaire scales. For evidence-based and in-depth explorations of the psychical processes involved in assessment generation, Subjective Evidence-Based Ethnography (SEBE) was applied. First, the participants' subjective views of these film scenes and their own rating activities on the scales in front of them were recorded with miniature cameras worn at eye levels (called SubCams). Thereafter, participants were interviewed together about their own first-person recordings, which activates episodic memory and enables in-depth reconstruction of the psychical processes during assessment generation. This audio-visual interview technique helped participants tremendously to reconstruct in detail how they have made their ratings, what they considered in the film, how they interpreted the behaviours and the personality scales. The sample comprised N = 80 persons; their transcribed interviews constitute a very substantial corpus of 577,616 words.

The textual materials were analysed with multiple methods to generate both qualitative and quantitative data about the same participants and items. The findings from the two studies and their different samples converged notably, establishing solid evidence for the processes revealed.

Participants’ interpretation of the same standardised item generally varied considerably, indicating broad fields of meaning. But each participant focussed on only some aspects; thus, different participants based their ratings on different interpretations of the same item. This challenges the widespread belief that standardised items could establish comparability of the ratings generated by different persons.

Participants also generally focused on very different pieces of evidence that could be taken from the photos and the different scenes of the films, respectively; but each participant considered only a few of them. Moreover, participants generally interpreted the same piece of evidence (e.g. business clothes) differently and therefore drew different conclusions about the target person’s personality. Given that participants did not have any information about the target person apart from a small passport-style photo showing a rather neutral face (Online Study) and apart from inconsistent information from a just 4-minute behavioural film clip (Video study), respectively, these conclusions provide insight about the raters’ social ideas and beliefs about individuals.

Although judgements and interpretations generally varied greatly for each target person, some tendencies occurred among them. Analyses yielded three subtle processes by which raters’ implicit beliefs about persons of particular gender and ethnicity influenced their impression formation and their assessments of the target person’s personality.

Depending upon the target person’ gender and ethnicity, participants sometimes tended to
1) focus on different pieces of evidence visually or audio-visually available (selective focus)
2) weigh the evidence considered differently (differential weighting), and to
3) interpret the same standardised item differently (variable interpretation).

The three processes, although often only moderate in effect and not pervasive across all the personality aspects identified and all items considered, reveal subtle differences that, in their entirety, can add up to manifest differences in the assessments of persons that are based entirely on the raters’ implicit beliefs and stereotypes triggered by a few visible clues about the target person’s gender, ethnicity (and age).

These findings provide empirical evidence for psychical and social processes that can lead to the frequently reported feelings of minority group members of being judged differently than majority group members for doing the same things. The findings also highlight limitations in the questionnaire methods’ ability to enable standardised and thus comparable assessments of persons. This has important consequences for their application in research and applied settings, such as in diversity management and promotion.

The two studies have also explored how people construct their own and others' ethnicity and how they indicate their ideas in predetermined social categories as widely used in standardised enquiries. These and further analyses are still in progress.