Visual perception in deep neural networks

Imagine you could regulate your emotions, thoughts, and perceptions as you please. Instead of listening to music, consuming alcohol or drugs, you would be able to click a button and a short while later a bad day would become the best day ever, scattered thoughts would give way and you would feel more focused than ever, ready to study for an exam. While such possibilities are still a far beyond our reach, in order to be able to do that one day, we need models of human brain. In other words, we need models that, given a particular stimulus, would produce an output indistinguishable from human response to that stimulus. Then, knowing the mapping, we could use this model to inform us what kind of stimulation would be optimal to elicit a desired mental state.

In this project, we took the first steps in this direction by building accurate predictive models of human and non-human primate neural and behavioral responses in a demanding visual object recognition task. We focused on three major objectives: (i) establish an extensive benchmark of human visual processing; (ii) using this benchmark, evaluate the quality of machine decisions in relation to human performance; and (iii) using the insights gained from such a comparison, develop new, biologically-informed state-of-the art architectures. We successfully reached these goals, building a large-scale integrative bechmarking platform called Brain-Score, evaluating tens of models on it, and developing CORnet, the current best model of visual system. Going forward, we expect our heavily quantitative and engineering-focused approach to understanding visual system to scale to building the models of the entire brain.

During the project, we developed a large-scale integrative benchmarking suite called Brain-Score that enables a principled comparison between the available brain data and models of the brain. Using this benchmark, we evaluated how well deep neural networks, the current best machine learning models, can predict brain data. We established that overall deep neural networks can predict brain responses to a high degree, yet architecturally they remain rather dissociated from neurophysiological constraints and are inadequate when predicting brain responses over time. Guided by these insights, we developed CORnet, a shallow recurrent deep neural network that aligns to the known brain anatomy and is the current best brain-predicting model.

We further evaluated how robust current machine learning models are when presented with images that are unlike the images they have been trained to recognize. Surprisingly, we found that models can generalize better than previously thought under minimal retraining, suggesting that in order to build robust models of visual processing, it may be beneficial to train them on even larger image datasets than currently available. In order to facilitate the generation of such large scale datasets that can be precisely controlled for various research questions, we build a 3D photorealistic virtual environment where virtual agents could interact with the virtual reality and allow a rapid testing of hypotheses how such systems learn.

Our work has been published in top neuroscience and machine learning venues, we presented it a multiple conferences, and it has been made accessible to the general public in various formats, including high-profile museum exhibitions, popular lectures and publications on the topics of science and AI.

We made our Brain-Score benchmark dataset freely available to everybody to use. Researchers can upload their models to our website, www.brain-score.org to compare them against other models, and can also submit their datasets for challenging models. We also made the data and the tools for doing such comparisons openly available. Similarly, we put a lot of effort to build as biologically-aligned models of brain as possible, such that they would work both on typical machine learning tasks and be predictive and compact as desired by neuroscience community. These models are openly available and a particular emphasis is put on making them useable to others. That is, even inexperienced users should be able to understand how they work and run them.

By making this platform and all these tools open to all, we hope to lead a shift in the approach to neuroscience. While up until recently most research would focus on phenomenological descriptions, we want to encourage researchers to ask quantitative questions and rigorously quantify progress in predicting how the brain will respond. Such shift is long overdue, in our opinion, and could be best likened to the shift that happened in astronomy 500 years ago. Until Kepler’s laws of planetary motion and subsequent mathematical formulations by Newton, most astronomers were busy documenting planetary and star positions but struggled to see clear patterns. Kepler’s laws concentrated that empirical knowledge into a formal predictive model. We are hoping that our benchmarking tools can contribute to such much broader cultural shift in neuroscience as well.

On the other hand, our virtual environment will help researchers to accelerate their experimental cycle. What took months in collecting and processing data, can come now at much lower cost now in a simulated reality where all attributes of every single entity are under experimenter’s control. This platform is being prepared to public access.

Periodic Reporting for period 2 - DEEPCEPTION (Visual perception in deep neural networks)

Partager cette page

Télécharger