Periodic Reporting for period 2 - M and M (Generalization in Mind and Machine)
Reporting period: 2019-03-01 to 2020-08-31
The impressive success of deep networks has led to billions of euros of investments by a wide range of companies attempting to solve important societal tasks, from medical diagnosis, language translation, and self-driving cars. It is also important to better understand how these models work, and whether these models are informing us about how the human brain works. A better understanding of the relationship between these models and humans not only holds the promise of improving these models, but advancing our knowledge in psychology and neuroscience, all important issues for society.
The overall objectives are:
1) Compare the performance of networks and humans in a range of cognitive domains in order to assess whether these models solve problems in a human-like way.
2) Compare the internal representations that networks and humans use to solve various tasks.
3) Carry out behavioural studies that carefully assess human generalization in the domain of vision and language, and compare how well neural networks support human performance
4) When models fail to capture human performance add various cognitive and biological constraints to model to make performance more human-like.
5) By adding cognitive and biological constraints to models we hope to improve the performance of models. This will not only be relevant understanding humans, but also useful for engineers and computer scientists who are only concerned with network
performance (regardless of whether the models are similar to humans).
6) Develop new benchmark tests (""mind score"") to assess how well models explain the psychology of cognition. These will be made available to other research teams so they can easily assess the psychological plausibility of their models."
a) We have shown that the human visual system supports much greater “on-line” translation invariance than previously claimed. That is, once a human can identify a novel object projected to one retinal location the person can identify this object at a wide range of retinal location. We have also shown that standard deep convolutional networks also fail to support human-like translation invariance unless we add special hand-wired mechanisms (‘global average pooling’ layer to a network).
b) We have shown that deep neural network models often recognize objects in ways very differently than humans. In one series of studies we have shown that models are happy to identify objects on the basis of a single diagnostic pixel rather than relying on the shape of an object.
c) We have shown how deep neural networks of vision do not show a ""shape bias"" that characterizes human vision in a series of simulations that directly compare shape and non-shape features that are diagnostic of category membership.
d) When we add a biological constraint to convolutional neural networks (adding ‘edge detectors’ much like simple cells in visual cortex) then the behave in a more human-like way, and do not pick up on single-pixels to identify objects. The models appear to rely more on shape given that they are better at identifying line drawings and Silhouettes that only have shape information. They still fail to show a shape bias in many contexts however.
e) We have shown that deep networks of vision do not encode the relations between objects parts when identifying objects. This also contrasts with human vision.
f) We have shown that the adversarial images that fool networks do no fool humans in a similar way. This challenges the findings from a recent paper published in Nature Communications.
g) We have found that deep networks have super-human capacities to identify unstructured data (e.g. learn to categorize patterns that look like tv-static to humans), a finding that also highlights the disconnect between human and machine vision. We have added some biological constraints to standard networks (e.g. adding noise to the activation of hidden layers, adding resource bottlenecks) to reduce the models capacity to learn unstructured data but still identify structured data such as photographs of images.
h) We have carried out preliminary studies showing that the internal representations in deep networks of object identification are less similar than is commonly claimed based on Representational Similarity Analysis.
In the domain of language and reasoning, we have carried out a number of projects, including:
a) We have developed a new empirical method to assess how well models of word naming can generalize to novel words. This method will make it much easier to carry out future studies and evaluate computational models in the future..
b) We have shown that recurrent networks do not capture some fundamental syntactic generalizations that humans do, including the well-known ‘Principle-C’ constraint.
c) We have shown that recurrent models of language production can learn both possible and non-possible human languages. This strongly suggests that the success of these models is not based on human-like mechanisms.
d) We have developed new network architectures that support more widespread generalizations than standard models. This includes solving tasks that Gary Marcus claim require symbols. In these models we used convolutions in a novel way.
e) We have introduced a new way of representing symbols in conventional network architectures (so-called VARS representations) that allow networks to better support combinatorial generalization in the domain of a simple memory task and in a visual reasoning task.
f) We have compared recurrent networks with standard long-short-term memory units with a more biologically inspired version of long-short-term memory units. We have found that the biologically inspired version performs more like humans in some conditions."