Final Report Summary - VIFER (The Visual Front-End of Reading)
Objectives: We aimed at deepening the understanding of the visual perception of written words and how this “Visual Front End of Reading” is shaped by statistical learning. We planned to achieve this by providing for the first time a comprehensive learning-based neurocomputational account of written word perception. We hoped that the model would emerge as statistical properties of the visual words, a hierarchical gradient of visual-lexical processing consisting of an increasingly more complex and abstract word features similar those found through fMRI in the left occipitotemporal cortex. The project planned to integrate the neurocomputational modeling approach with another research method, electrophysiology (EEG). The EEG method was expected to provide novel types of neural signatures characterizing the timing and spatial aspects of the cortical processing involved in reading, and the model was expected to account for these findings. The PI was expected to exploit and extend his expertise in computational modeling and to learn through research proficiency in the EEG technique.
Modeling: The most important research work and achievement in the course of the VIFER project was the development of several advanced neurocomputational models that can shed light on the fine details of visual word perception. We exploited several types of generative models, each of which addressed different aspect of the cognitive processes involved in word perception, and extended the research even beyond the planned activity towards a more complete picture of the complex phenomenon of reading. The first set of results regard the processes involved in the perception of letters, the elementary units that build the words. Simulations with deep generative networks revealed a plausible hierarchical structure of visual features that emerged during unsupervised perceptual learning. The study provided computational support for the recycling hypothesis according to which cortical structures tuned to support domain-general vision also support the perception of human artifacts, including letters. We then investigated how letters could sequentially combine into words, and the emergence of high-level lexial statistical regularities. An extension of the same model that we used for letter perception - recurrent generative neural networks - could readily acquire lexical grammars that might encode a sense of wordness. The critical novelty of this result was the demonstration that the generative approach could successfully apply in the domain of sequential letter processing, and show that the emergent statistical model readily simulates essential psycholinguistic patters. Furthermore, and most importantly to the modeling line of research, was the work on whole word visual perception starting from raw visual input, which result covered a big gap in this domain despite that word recognition is a traditional problem from engineering point of view. By using deep generative networks (Figure 1a) and assuming sensory input with realistic variability of the spatial position (Figure 1b), we developed a neurocomputational model that could unveil for the first time a wide range of neural-level details in visual word perception (Figure 1c), including spatial normalization. The model also provided an explanation of how the same normalization process induced a range of psycholinguistic phenomena, for example, neglecting letter transposition (Figure 1d). Moreover, the simulations revealed fine details of the hierarchical structure allowing word perception, and that the single letters are the essential top-level perceptual element in this hierarchy (Figure 1c). Finally, we would mention one important additional research work in the domain goal-directed behavior, which paved the way for building a complete learning model of reading that includes perception and control of saccades.
EEG: The second line of planned research activity included advanced EEG analysis aiming at the individuation of EEG-signatures of perceptual components in reading. These signatures were expected to shed light on the complex dynamics of the visual perceptual processes in reading and could had practical applications such as decoding the perceived letters/words from cortical activity during reading. We also planned to use time-space EEG components as a neural-level benchmark for our neurocomputational models of reading. This research was motivated by exciting earlier research showing differential EEG patterns of activity for different letters. However, our extensive research on the individuation of EEG-signatures of letters showed that the EEG-signal could only allow the discrimination of low-level visual features. Following this unexpected result, we turned to a novel EEG approach for studying the neural basis of cognitive functions that was based on the so-called Steady-State Visual Evoked Potentials (SSVEPs). SSVEPs represent the electrophysiological response of the cortex to the flickering of one or few components or the visual stimuli. We tested the possibility to use SSVEPs as a tool for the investigation of the mechanisms involved in visual word perception: by flickering different parts of the words with different frequencies, one could characterize the underlying neural structures. We also adopted the assumption that the power of the SSVEPs provides a measure of the complexity of the underlying network organization. First, in an exploratory study we confirmed that intuition by showing that words evoke stronger SSVEPs than pseudowords and that high frequency words elicit stronger SSVEPs than low frequency words. Following this promising result, we run a second SSVEP study aimed at investigating the properties of letter position coding in the context of a classical paradigm studying letter transpositions, in which words are contrasted with string that differ from the words just by the transposition of two neighboring letters, or bigram, whose position was also manipulated. We flickered with one frequency letters that were (or had to be) transposed, and with another frequency the rest of the stimuli. The fundamental prediction was that non-transposed letters would elicit stronger SSVEPs than transposed ones because the non-transposed letters are a part of a word stimulus, thus, evoke stronger activation of the underlying neural processing mechanism. Even more interesting, and based on the human data and our neurocomputational account, we predicted that the level of position-coding uncertainty would also affect the SSVEP power, whereby letters with greater position-coding uncertainty would result in weaker SSVEPs. The results in two experiments confirmed these predictions. We found significant differences between transposed and non-transposed words, and at the letter-level, a greater difference towards the beginning of the words that was due to smaller position uncertainty. Overall, the results are extremely promising and they pave the way towards a completely different way of studying the details of the perceptual processes involved in reading, and provide novel benchmarks for neurocomputational studies.