European Commission logo
English English
CORDIS - EU research results

Generalization in Mind and Machine

Periodic Reporting for period 3 - M and M (Generalization in Mind and Machine)

Reporting period: 2020-09-01 to 2022-02-28

There is widespread interest in deep neural networks that are making impressive strides in solving difficult tasks such as object recognition, speech recognition, language translation, game playing, amongst many other tasks. This raises the obvious question as to whether these networks are solving these problems in a human-like manner and thus tell us something important about the human brain, or whether these networks are solving these tasks in qualitatively different ways. This project compares the performance of various neural network model to humans across a range of domains in which we have a good understanding of human performance. We are specifically concerned with question of how well these models generalize in the domain of vision, memory, problem solving, and language, and comparing performance to humans. We are also interested in modifying models to make them more human-like, and consider whether it is necessary to introduce more psychological and neurological constraints to these models, including “symbolic” machinery that is often claimed to be a core part of human cognition.

The impressive success of deep networks has led to billions of euros of investments by a wide range of companies attempting to solve important societal tasks, from medical diagnosis, language translation, and self-driving cars. It is also important to better understand how these models work, and whether these models are informing us about how the human brain works. A better understanding of the relationship between these models and humans not only holds the promise of improving these models, but advancing our knowledge in psychology and neuroscience, all important issues for society.

The overall objectives are:

1) Compare the performance of networks and humans in a range of cognitive domains in order to assess whether these models solve problems in a human-like way.
2) Compare the internal representations that networks and humans use to solve various tasks.
3) Carry out behavioural studies that carefully assess human generalization in the domain of vision and language, and compare how well neural networks support human performance
4) When models fail to capture human performance add various cognitive and biological constraints to model to make performance more human-like.
5) By adding cognitive and biological constraints to models we hope to improve the performance of models. This will not only be relevant understanding humans, but also useful for engineers and computer scientists who are only concerned with network performance (regardless of whether the models are similar to humans).
6) Develop new benchmark tests ("mind score") to assess how well models explain the psychology of cognition. These will be made available to other research teams so they can easily assess the psychological plausibility of their models.
In the domain of vision, we have carried out a variety of projects, including:

a) We have shown that the human visual system supports much greater “on-line” translation invariance than previously claimed. That is, once a human can identify a novel object projected to one retinal location the person can identify this object at a wide range of retinal location. We have also shown that standard deep convolutional networks also fail to support human-like translation invariance unless trained on the relevant transformations. We have shown that DNNs can support translation, scale, rotation in the picture plane, and rotation in the depth plane following appropriate training (Biscione & Bowers, 2021, 2022).

b) We have shown that deep neural network models often recognize objects in ways very differently than humans. In one series of studies we have shown that models are happy to identify objects on the basis of a single diagnostic pixel rather than relying on the shape of an object (Malhotra et al., 2020). In a follow-up study we directly compare the features that humans and DNNs rely on when learning to classify novel objects that can be classified based on shape or a range of non-shape features. Again models do not perform like humans (Malhotra et al., in press).

c) We have shown how deep neural networks of vision do not a encode objects by their parts and the relations between the parts. This again contrasts with humans who explicitly encode object parts and their relations (Malhotra et al., 2022).

d) When we add a biological constraint to convolutional neural networks (adding ‘edge detectors’ much like simple cells in visual cortex) then the behave in a more human-like way, and do not pick up on single-pixels to identify objects. The models appear to rely more on shape given that they are better at identifying line drawings and Silhouettes that only have shape information. They still fail to show a shape bias in many contexts however (Evans et al., 2022; Malhotra et al., 2020).

e) We have shown that the adversarial images that fool networks do no fool humans in a similar way. This challenges the findings from a recent paper published in Nature Communications (Dujmović et al., 2020).

f) We have found that deep networks have super-human capacities to identify unstructured data (e.g. learn to categorize patterns that look like tv-static to humans), a finding that also highlights the disconnect between human and machine vision. We have added some biological constraints to standard networks (e.g. adding noise to the activation of hidden layers, adding resource bottlenecks) to reduce the models capacity to learn unstructured data but still identify structured data such as photographs of images (Tsvetkov et a., 2022).

g) We have shown that that the internal representations in deep networks of object identification are less similar than is commonly claimed based on Representational Similarity Analysis (RSA). Indeed, we show conditions in which networks designed to classify objects in a qualitatively different way than humans nevertheless show high RSAs with brain activations (Dujmović et al., 2020).

h) We have found that DNN do not support a range of Gestalt organizational principles that are central to human perception and object recognition (Biscione & Bowers, 2022).

In the domain of language and reasoning, we have carried out a number of projects, including:

a) We have developed a new empirical method to assess how well models of word naming can generalize to novel words. This method will make it much easier to carry out future studies and evaluate computational models in the future (Gubian et al., in press)

b) We have shown that recurrent networks do not capture some fundamental syntactic generalizations that humans do, including the well-known ‘Principle-C’ constraint (Mitchell & Bowers., 2019), and can learn unstructured languages unlike any human language and that would be difficult if not impossible to learn (Mitchell & Bowers, 2020).

c) We have shown that standard DNNs that have been claimed to support same/different visual reasoning in fact fail when tested appropriately (Puebla & Bowers, 2021), and even models designed to solve relational reasoning fail (Pueble & Bowers, in preparation).

d) We have developed new network architectures that support more widespread generalizations than standard models. This includes solving tasks that Gary Marcus claim require symbols. In these models we used convolutions in a novel way (Evans & Bowers, 2021).

e) We have introduced a new way of representing symbols in conventional network architectures (so-called VARS representations) that allow networks to better support combinatorial generalization in the domain of a simple memory task and in a visual reasoning task (Vankov & Bowers, 2019).

f) We have found that networks that learn disentangled representations continue to fail in combinatorial generalization tasks (Montero et al., 2021, 2022).

g) We have found that models of spoken word identification show some similarities to human speech, but they also differ in fundamental ways suggesting that current models are missing key features of human system (Adolfi et al., 2022).
We have identified numerous ways in which current state-of-the-art models of vision succeed in non-human-like ways, and we have started to add biological constraints to models in order to make models perform more like humans. Our goal is to develop a "mind-score" benchmark that will provide a systematic and simple way to assess models in terms of their psychological plausibility. We will start with a "mind-score vision" benchmark test, but we also plan to develop "mind-score language" test. Constraints that we expect will be particularly important will include: adding noise to the activation of units, adding various processing limitations, adding symbolic machinery or learning signals to models, and adding time constraints to the processing and learning of new information. Also, better training environments will be key. We have already shown how the addition of noise in hidden layers makes network model of vision behave in a more psychologically plausible manner, and we have shown that novel uses of convolutions can improve generalization in classic test conditions that standard networks fail. Our recent work on analyzing the hidden layers of networks suggest that current state-of-the art networks do not have internal representations that are similar to humans (contrary to a number of recent high-profile papers). Our ultimate goal is to highlight the importance of psychological data in constraining network models of vision, language, memory, and problem solving. We have already identified all the tests for the mind-score vision test and are in the process of testing current models on this new benchmark.
Poster by Blything et al. at Cognitive Science (2019)
Photo of research group on away day
Poster by Malotra et al. at Cognitive Science (2019)
Poster by Llera Montoro et al. UK Neural Computation (2019)