Skip to main content

Generalization in Mind and Machine

Periodic Reporting for period 2 - M and M (Generalization in Mind and Machine)

Reporting period: 2019-03-01 to 2020-08-31

"There is widespread interest in deep neural networks that are making impressive strides in solving difficult tasks such as object recognition, speech recognition, language translation, game playing, amongst many other tasks. This raises the obvious question as to whether these networks are solving these problems in a human-like manner and thus tell us something important about the human brain, or whether these networks are solving these tasks in qualitatively different ways. This project compares the performance of various neural network model to humans across a range of domains in which we have a good understanding of human performance. We are specifically concerned with question of how well these models generalize in the domain of vision, memory, problem solving, and language, and comparing performance to humans. We are also interested in modifying models to make them more human-like, and consider whether it is necessary to introduce more psychological and neurological constraints to these models, including “symbolic” machinery that is often claimed to be a core part of human cognition.

The impressive success of deep networks has led to billions of euros of investments by a wide range of companies attempting to solve important societal tasks, from medical diagnosis, language translation, and self-driving cars. It is also important to better understand how these models work, and whether these models are informing us about how the human brain works. A better understanding of the relationship between these models and humans not only holds the promise of improving these models, but advancing our knowledge in psychology and neuroscience, all important issues for society.

The overall objectives are:

1) Compare the performance of networks and humans in a range of cognitive domains in order to assess whether these models solve problems in a human-like way.
2) Compare the internal representations that networks and humans use to solve various tasks.
3) Carry out behavioural studies that carefully assess human generalization in the domain of vision and language, and compare how well neural networks support human performance
4) When models fail to capture human performance add various cognitive and biological constraints to model to make performance more human-like.
5) By adding cognitive and biological constraints to models we hope to improve the performance of models. This will not only be relevant understanding humans, but also useful for engineers and computer scientists who are only concerned with network
performance (regardless of whether the models are similar to humans).
6) Develop new benchmark tests (""mind score"") to assess how well models explain the psychology of cognition. These will be made available to other research teams so they can easily assess the psychological plausibility of their models."
"In the domain of vision, we have carried out a variety of projects, including:

a) We have shown that the human visual system supports much greater “on-line” translation invariance than previously claimed. That is, once a human can identify a novel object projected to one retinal location the person can identify this object at a wide range of retinal location. We have also shown that standard deep convolutional networks also fail to support human-like translation invariance unless we add special hand-wired mechanisms (‘global average pooling’ layer to a network).

b) We have shown that deep neural network models often recognize objects in ways very differently than humans. In one series of studies we have shown that models are happy to identify objects on the basis of a single diagnostic pixel rather than relying on the shape of an object.

c) We have shown how deep neural networks of vision do not show a ""shape bias"" that characterizes human vision in a series of simulations that directly compare shape and non-shape features that are diagnostic of category membership.

d) When we add a biological constraint to convolutional neural networks (adding ‘edge detectors’ much like simple cells in visual cortex) then the behave in a more human-like way, and do not pick up on single-pixels to identify objects. The models appear to rely more on shape given that they are better at identifying line drawings and Silhouettes that only have shape information. They still fail to show a shape bias in many contexts however.

e) We have shown that deep networks of vision do not encode the relations between objects parts when identifying objects. This also contrasts with human vision.

f) We have shown that the adversarial images that fool networks do no fool humans in a similar way. This challenges the findings from a recent paper published in Nature Communications.

g) We have found that deep networks have super-human capacities to identify unstructured data (e.g. learn to categorize patterns that look like tv-static to humans), a finding that also highlights the disconnect between human and machine vision. We have added some biological constraints to standard networks (e.g. adding noise to the activation of hidden layers, adding resource bottlenecks) to reduce the models capacity to learn unstructured data but still identify structured data such as photographs of images.

h) We have carried out preliminary studies showing that the internal representations in deep networks of object identification are less similar than is commonly claimed based on Representational Similarity Analysis.


In the domain of language and reasoning, we have carried out a number of projects, including:

a) We have developed a new empirical method to assess how well models of word naming can generalize to novel words. This method will make it much easier to carry out future studies and evaluate computational models in the future..

b) We have shown that recurrent networks do not capture some fundamental syntactic generalizations that humans do, including the well-known ‘Principle-C’ constraint.

c) We have shown that recurrent models of language production can learn both possible and non-possible human languages. This strongly suggests that the success of these models is not based on human-like mechanisms.

d) We have developed new network architectures that support more widespread generalizations than standard models. This includes solving tasks that Gary Marcus claim require symbols. In these models we used convolutions in a novel way.

e) We have introduced a new way of representing symbols in conventional network architectures (so-called VARS representations) that allow networks to better support combinatorial generalization in the domain of a simple memory task and in a visual reasoning task.

f) We have compared recurrent networks with standard long-short-term memory units with a more biologically inspired version of long-short-term memory units. We have found that the biologically inspired version performs more like humans in some conditions."
"We have identified numerous ways in which current state-of-the-art models of vision succeed in non human-like ways, and we have started to add biological constraints to models in order to make models perform more like humans. Our goal is to develop a ""mind score"" benchmark that will provide a systematic and simple way to assess models in terms of their psychological plausibility. We will start with a ""mind score vision"" benchmark test, but we also plan to develop ""mind score language"" test. Constraints that we expect will be particularly important will include: adding noise to the activation of units, adding various processing limitations, adding symbolic machinery or learning signals to models, and adding time constraints to the processing and learning of new information. We have already shown how the addition of noise in hidden layers makes network model of vision behave in a more psychologically plausible manner, and we have shown that novel uses of convolutions can improve generalization in classic test conditions that standard networks fail. Our recent work on analyzing the hidden layers of networks suggest that current state-of-the art networks do not have internal representations that are similar to humans (contrary to a number of recent high-profile papers). Our ultimate goal is to highlight the importance of psychological data in constraining network models of vision, language, memory, and problem solving."