We have developed languages for expressing action models, general policies, and hierarchical representations, and have developed methods for learning them without supervision using both combinatorial and deep learning methods. In the case of action models, we took the languages "off-the-shelf" (namely, STRIPS and PDDL) but in the second half of the project we will look at more suitable and expressive alternatives. This is all novel and "beyond the state of the art", yet, the main contribution is a methodology for addressing these problems, step by step, in a systematic way, leaving behind solid concepts, and techniques. These concepts and techniques will be the building blocks for addressing the next round of challenges in the project: dealing with raw data in the form of images for learning action models, general policies, and data, and dealing with continuous action and state spaces. We are well aware that current deep reinforcement learning methods, and in particular, policy optimization methods, manage to deal quite smoothly with both: images and continuous action and state spaces. Yet these methods, by themselves, produce representations that are not meaningful, general, transparent, or reusable. The goals of RLeap is to develop a methodology for obtaining the best of both worlds: learning systems that produce meaningful, flexible, and reusable high-level representations from low level, raw data. In this sense, deep learning and deep reinforcement learning will be part of the solution but the won't be the full solution. In particular, state representations are to be learned over high-level, logical languages, or it should be possible to understand them in this way.
By the end of the project, we would be happy if we are able to learn high-level, meaningful, and transparent action models, general policies, and general problem decompositions (sketches) in two domains: video games like the Atari games, and robotics. The first involves learning high-level representations from raw pixels; the second involves, in addition to cameras, the control of a robot arm that operates in continuous action and state spaces. If we manage to learn in those domains what we currently manage to learn in STRIPS-like settings, the goals of the project will be accomplished. This is the agenda. A lot has been learned, and a lot to be learned.