Community Research and Development Information Service - CORDIS

H2020

DREAM Report Summary

Project ID: 640891
Funded under: H2020-EU.1.2.2.

Periodic Reporting for period 1 - DREAM (Deferred Restructuring of Experience in Autonomous Machines)

Reporting period: 2015-01-01 to 2015-12-31

Summary of the context and overall objectives of the project

"DREAM is a robotics project that incorporates sleep and dream-like processes within a cognitive architecture. This enables an individual robot or a group of robots to consolidate their experience into more useful and generic formats, thus improving their future ability to learn and adapt. DREAM relies on Evolutionary Neurodynamic ensemble methods as a unifying principle for discovery, optimization, re-structuring and consolidation of knowledge. This new paradigm will make the robot more autonomous in its acquisition, organization and use of knowledge and skills just as long as they comply with the satisfaction of pre-established basic motivations.

DREAM will enable robots to cope with the complexity of being an information-processing entity in domains that are open-ended both in terms of space and time. It paves the way for a new generation of robots whose existence and purpose goes far beyond the mere execution of dull tasks.

Accumulating knowledge over long periods of time requires a consolidation process, so as to avoid being overwhelmed by the abundance of incoming information. Sleep has been shown to be critical for many consolidation processes, such as restructuring of representations, maintaining knowledge integration and coherence, improving insight learning, driving abstractions, forming novel levels of description, deleting unwanted information, exploring recombination of concepts, and stimulating creative thinking (Wagner et al., Nature, 2004). Our targeted scientific breakthrough is to enable robots to gain an open-ended understanding of the world over long periods of time, with alternating periods of experience and sleep. The possible benefits of sleep has so far been neglected in robotics and artificial intelligence.

To achieve higher levels of autonomy and understanding in developmental robotics, we propose a paradigm shift with DREAM, a cognitive architecture that exploits sleep to improve its functioning. It is contended here that Evolutionary Neurodynamic ensemble methods (Fernando et al, Frontiers in Comp Neuro, 2012; Bellas et al., IEEE-TAMD, 2010) are a unifying principle for creative thinking and knowledge consolidation; these methods form the core of DREAM. Our key insight is that the brain consists of three coupled subsystems that are generated and adapted according to experience through evolutionary means: Models to make predictions about future state of the environment, notably to understand the results of actions; Policies that generate actions and behaviors, and are related to task-specific perceptual features; Values to reward, evaluate and compare policies or models. The long-term vision is to build genuinely situated and embodied agents with beliefs, desires, personalities, and idiosyncrasies, who are as inevitably influenced by their individual developmental trajectories as we are. To reach the proposed adaptive properties, the architecture will rely on alternating between active interaction and passive introspection over past events, i.e. sleep, and must satisfy the following realistic and measurable objectives:

Objective 1

Evolve and decompose new values and motivations in an open-ended manner, on the basis of a low-dimensional set of immutable intrinsic motivations and other self-built values. Users should be able to guide the evolution of the value system, and values should be exchangeable between different robots. The discovery of new policies and models enables the robot to associate specific skills as being capable of fulfilling specific motivations, which drives the exploration of its own abilities. This objective is met when the robot discovers its own talents and capabilities in an open-ended manner ("I can move objects"), can use human interaction to guide and accelerate this discovery ("I should move objects there"), and can use its discovered talents to fulfill externally specified tasks ("tidy the table").

Objective 2

Restructure representations and models to understand and organize the dynamics of being in a particular environment. This understanding is organized as a set of predictive models and efficient policies, which are acquired during experience and then evolved, refined and restructured during sleep. This objective is met when restructuring enables the robot to understand basic causal relationships such as gravity (as studied in Daniel Povinelli's experiments with chimpanzees), or more task-related relationships ("different skills are required to manipulate objects of type A and B").

Objective 3

Consolidate knowledge by committing successful predictive models (those that predict environment dynamics well), value decompositions (those that satisfy lower level motivations), and policies (those that fulfill values) to long-term memory. This consolidation takes place during sleep. This objective is met when robots can switch between different contexts or domains, without catastrophic forgetting of previously acquired models, values and policies ("I am able to manipulate books in the library and cutlery in the kitchen")

Objective 4

Expand knowledge through social interactions by sharing knowledge between different actors in the environment. Different experiences can lead to different models, policies or values, depending on the encountered conditions. Sharing knowledge can lead (1) to the identification of the most efficient ones and (2) to more generic and robust values, models and policies through a consolidation over the knowledge acquired by the different actors. This objective is met when a robot can generalize over an unforeseen context through the experience of other robots ("I can manipulate cutlery though I never went to a kitchen before"). Robots should be able to identify one another based on their capabilities and experience to enable privileged information sharing ("we have always been very close, I trust you better than any other about solving this new problem")."

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

"DREAM is a fundamental research project that proposes to study approaches alternating between active periods in the real world ("daytime") and analysis or exploration in simulation ("nighttime") for representational redescription in robotics. This vision of robotics development and learning is original and requires to lay down new grounds. The implications of such approaches need in particular to be identified. Furthermore, adapted evaluation metrics as well as experimental protocols need to be defined. Once this is done, current algorithms can be adapted to this new vision, or new ones can be developed. It implies that the project will first need to make progress on a fundamental basis before focusing on experimental results. Besides the setting up of the different tasks, the main result of this first year of the project is then the definition of a terminology covering the main notions related to this vision, of metrics to evaluate development progress and of experimental protocols to test them. CAFER, a software framework has also been implemented on top of the ROS middleware. Its goal is to facilitate the implementation and integration of components in the DREAM cognitive architecture. The first public release has been created in December 2015. Besides this fundamental work, we have also made progress on the different components of the future DREAM cognitive architecture (utility functions, policies and predictive models), on the cognitive architecture itself and on the links with neuroscience and biologically plausible models.

Utility functions describe the rewards that the robot will try to optimize during its development. It includes functions describing how well the robot did achieve its mission (global utility function provided at the beginning of the developmental process), but also utility functions that are built on-the-fly during the developmental process. Some other utility functions may be mission independent and describe motivations like curiosity, for instance. Finding out what functions to provide and how to create intermediate reward functions is one of the goals of the project. During this first year, we have started to define the motivational engine aimed at supporting utility function management, and we have explored curiosity-based motivations as well as the creation of sub-goals, i.e. of intermediate utility functions helping the robot to acquire the skills it needs to fulfill its mission.

Policies determine the behaviour exhibited by the robot in response to its perceptions. The policy relies on different representations that are of critical importance for its efficiency (perceptions, actions, links between the two). We aim at letting the robot discover relevant representations instead of providing them. In scenarios with a limited complexity (few degrees of freedom, limited perception ability), we have defined an algorithm to build the representations required by a classical reinforcement learning algorithm out of the sensori-motor flow experienced by the robot during a first agnostic learning process. It significantly increases both generalization ability and learning speed. In a more realistic robotic setup, in which an arm can interact with objects and the scene is perceived by a depth camera, we have defined an approach relying on a first babbling in the real world (daytime) followed by an exploration in simulation with an evolutionary algorithm called MAP-Elite (nighttime) to generate the data required for the representational redescription process. The algorithm discovered behaviours in which a cube is grasped and launched in a box. The next step is to extract perceptions and actions that will allow the robot to reproduce these behaviours in different situations without the need to learn again (generalization) or to exploit this knowledge to learn faster to grasp different objects (faster learning). We are currently exploring the use of deep learning and gated auto-encoders for the perception part, and of dynamical movement primitives for the action part. Other formalisms like Bayesian Networks are under consideration, notably to propose a new framework allowing to transfer knowledge from one task to another.

Models allow to predict the result of an action (what are the new state or the new perceptions, what reward is obtained). They are required to learn and plan actions. Regression algorithms learn relationships between inputs and outputs. They are at the heart of many model learning approaches. We have developed a unified framework allowing to distinguish the representation and the algorithm used for learning. Many different regression algorithms revealed to rely on the same representation. They are all special cases of a unified model. This new and original view aligns with DREAM's goal to focus on representation restructuring by separating the representation from the algorithm that learns it.

Utility functions, policies and predictive models are managed by a cognitive architecture that must decide which of those elements to exploit at a given time and in a given context. Several time scales are intermixed and recently acquired information (short term memory) is to be considered and compared to information previously consolidated (long term memory). DREAM cognitive architecture is based on a multi-level Darwinian brain architecture, that is currently being extended to handle these issues. Likewise, we have started to explore the mathematical framework on which to rely for knowledge abstraction.

Beyond the development of a single individual, we study the influence of knowledge sharing between several robots. During this first year, we have developed the experimental setup and made first tests of this social learning approach in an obstacle avoidance setup, both in simulation and in reality, on Thymio II robots. These preliminary experiments have shown the potential of the approach, that we are currently testing on more difficult tasks.

DREAM tries to replicate some of the functions attributed to sleep and dream processes in animals and humans. A work package is dedicated to the identification of the relevant literature in neuroscience and to the study of biologically plausible models of the knowledge redescription processes occurring during sleep. These multi-disciplinary exchanges have two different goals: identifying computational models that can be adapted to our robotics application and designing new neuroscience models inspired by DREAM ideas. During this first year, we have focused on the song development of zebra finches. The development of bird song shows interesting properties with respect to DREAM: the song starts completely unstructured and progressively acquires a structure along the development. Sleep seems to play a significant role in this process, but its exact role is unknown yet and no model have been proposed to explain it. We have thus started a modeling approach in collaboration with experimental biologists to explore the relevance of DREAM hypotheses and ideas in this context. We have also explored biologically plausible models of perception (deep learning) and observed the positive impact of data augmentation, equivalent to the replay observed during sleep. A study comparing different architectures has also highlighted the positive impact of ensembles of neural networks, that was one of the hypotheses that we have proposed to explore in the project."

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

"DREAM is a fundamental research project. The first year has been focused on laying the foundations of the representational restructuring approach advocated in the project. The main progresses beyond the state of the art during this first year are then related to the methodological and theoretical basis that we will rely on in the rest of the project. We have also proposed and tested algorithms to perform the representational redescription process or to generate the information it needs. We have likewise identified and started to build computational neuroscience models of sleep improvement of learning during development, in a motor context involving representational redescriptions: the development of song in birds (zebra finches). While this animal model of development is well studied, the implication of sleep in it has been little investigated. Each of those different contributions are expected to have a significant impact on robotics and on the potential use of robots as well as on neuroscience, especially in the communities studying vocalization development and learning improvement during sleep.

The methodology developed during the first year of the project aims at defining the framework in which the following work will take place. We have first defined a terminology and a methodology for validating our results. The terminology revealed necessary as the notions of interest for the project had different names depending on the scientific community. All partners did not agree on the terms and we converged on a nomenclature after discussions. Sharing a common vocabulary was mandatory for us. It is furthermore a mandatory step to create a research community around this topic. The methodology is a second significant step in this direction as it allows Kuhn's 'normal science' to occur, what is necessary to consolidate knowledge and federate researchers. Our methodology relies on the definition of different criteria to measure development progress. It allows to compare methods and defines criteria to know when our goal will be reached. The importance of this contribution has been recognized by David Vernon, an expert in developmental robotics, who was invited as a scientific advisor at a DREAM meeting. These contributions allow to define the research field focused on DREAM's questions, but there is also a need to define the theoretical basis that will make this field progress towards its goals.

We have worked on the theoretical basis to open new research avenues to reach the project goals. Many different regression algorithms do exist today, each with its own specificities. These algorithms are critical for model learning, one of the key aspects of our developmental approach. We have proposed a new definition of such algorithms that has allowed to unify them under the perspective of a representation and a learning algorithm, perspective that is at the core of DREAM. We have also proposed an approach to formalize the transfer of knowledge when learning Bayesian Networks, that are powerful representations to make predictions while managing uncertainty. These contributions will help to create a community around these topics and will feed it with different paths to explore. A workshop will be organized in 2016 to advertise it and to federate researchers.

The impact of these first contributions will not be limited to academia. The criteria we have defined measure progress in the direction of adaptive robots. More precisely, they measure (1) generalization ability and (2) learning speed progress. Generalization is the ability to exhibit a given behaviour in a large range of conditions. This is expected to help robots to go out of controlled conditions and to face open environments. Up to now, this is possible for reactive behaviours involving navigation only, as exemplified by vacuum cleaner robots. Behaviours implying object manipulation are more fragile, as they critically depend on object shape and weight. It is possible to program a robot to grasp a particular set of objects, but it remains difficult to program a controller that can grasp any object. Mechanical devices like the Versaball from Universal Robotics make grasping easier, but they do not completely solve the problem, as it does not tell what to do with the object or how to put it on a table. Providing robots with the ability to discover it by themselves is then a critical step towards the design of robots able to face open environments. The existence of such robots would create new markets for robotics, notably service robotics with applications like housekeeping or search and rescue.

The cross-fertilization approach between robotics and neuroscience may have an impact that goes beyond engineering and robotics. We aim at better understanding how a representation restructuring process occurs, to provide robots with this ability. Animals and humans have this ability. Drawing inspiration from the knowledge we have of this process may then help us in our quest. The idea we have proposed to test in a robotics context may anyway also help us to better understand how animals and humans actually do it. Little is known yet about the neural substrate of this process. The algorithmic principles we are proposing may give some insights to neuro-scientists, just like reinforcement learning helped them build new models of reward-based learning. We are exploring the relevance of DREAM principles for a well-studied development setup: the development of Zebra finches' song. The role of sleep in consolidating memories and in improving learning processes has been shown empirically, however the mechanisms underlying these processes are still poorly understood. It has been shown in navigation tasks with rodents, that the sequential reactivations of hippocampal neurons coding for place representations during "sharp wave ripples" are the physiological substrate of such sleep-learning processes. How such processes also contribute to the acquisition of lower level motor skills is however unknown. The development of Zebra finches vocalizations is interesting per se, but more importantly, is considered a good model of vocalizations development in general. Proposing a computational model of sleep effects on vocalization development in the Zebra finch will thus allow to gain knowledge on the development of vocalization processes in other animals and humans, and on sleep learning consolidation for motor functions."

Related information

Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top