Building an AI program applicable to any situation is very difficult, because all those situations, and their adapted behaviours, first have to be identified. The insight that inspired the EU-supported DREAM (Deferred Restructuring of Experience in Autonomous Machines) project was that similar processes to those identified during sleep could help robots more easily acquire, organise and use knowledge and skills. Exposing robots to more open-ended scenarios in space and time, led the team to proposals for a new generation of robots.
Within the field of machine learning, ‘reinforcement learning’, which links desired behaviours to positive feedback, has been suggested to teach robots how to complete tasks. However, due to several limitations, this approach has not yet been applied. Chief amongst these limitations is that the underlying algorithms cannot ascribe cause and effect. As project manager Prof. Stéphane Doncieux explains, “Suppose the robot gets a numerical value signal as positive feedback, to really learn, the algorithm needs to know what state this value is associated with: is it due to its arm movement, to a button being pushed or to something else?” DREAM reduced the amount of specific information necessary for a robot to accomplish a task, developing adaptive algorithms which could be applied to different scenarios, but still able to find appropriate solutions without continual modification. “Current learning algorithms often assume expert knowledge. In fact, naive learning offers opportunities if you can exploit it appropriately. This is reminiscent of what happens when animals and humans sleep,” says Prof. Doncieux. In practical terms, robot learning becomes a sequence of processes alternating interactions with the real world and exploitation of the data generated, rather than a single process. During the ‘awake’ sessions, the robot observed the consequence of its actions to understand how the environment is structured. During ‘dreaming’, the robot explored, in simulation, many possible interactions, registering those that generated identifiable effects on a chosen object (e.g. moving it). Now it could perform simple tasks but only within tight parameters, providing a kind of library of actions with which to train deep learning algorithms. Another ‘dreaming’ process based on such algorithms helped the robot to generalise them to other situations. Other ‘dreaming’ phases were focused on transfer learning, to build upon the knowledge acquired. Various approaches were explored including transfer from short-term to long-term memory and transfer between different individuals (social learning), as knowledge acquired in a group has been shown to accelerate learning and make it more robust.
A new paradigm within grasp
DREAM experimented with different humanoid robots PR2 and Baxter, for instance, focusing on object interaction using their arms. “The robots distinguished which parts of the environments they can act on for a particular effect (like moving or lifting). Crucially, the proposed adaptation methods could deal with different tasks without modification. For example, depending on the effect we asked them to explore, they could generate ball handling or joystick manipulation,” says Prof. Doncieux. Encouraged by their experiments, the team is now working at the theoretical level to shed more light on some of the building blocks of their approach, such as how robots can discover relevant behaviours, when little is known about what actions or states should look like.
DREAM, robot, AI, machine learning, sleep, dream, algorithm, reinforcement learning