Skip to main content
European Commission logo
italiano italiano
CORDIS - Risultati della ricerca dell’UE
CORDIS

From Data-based to Model-based AI: Representation Learning for Planning

Periodic Reporting for period 2 - RLeap (From Data-based to Model-based AI: Representation Learning for Planning)

Periodo di rendicontazione: 2022-04-01 al 2022-12-31

The goal of the RLeap project is to address and solve the problem of learning meaningful, high-level structured, symbolic representations from non-symbolic data (raw images) in the setting of action, planning, and control. The problem is far from solved, with current ideas and methods proving to be inadequate. On the one hand, data-based methods such as those based on deep learning and deep reinforcement learning have been shown to produce powerful controllers, but not the type of crisp and structured representations that support transparency, generality, and compositionality. On the other hand, model-based methods can be flexible, transparent, and reliable, but depend on models that are constructed by hand. By showing how to learn structured, meaningful, and high-level models from data, RLeap will contribute to combine the benefits of data-based learners and model-based solvers that are necessary for building AI systems that are robust, explainable, and trustworthy.

The technical objectives of the projects are the following:

Objective 1: Learning representations for planning. This involves discovering the representation of states in terms of objects, relations, and general action schemas; what is called a first-order action model able to predict possible courses of actions, even when the number and configuration of objects change. The inputs for learning these planning representations can be of many forms: from sample trajectories that convey information about the structure of the state space to grid-based representations or image representations of the states themselves. This objetive also involves learning an interpretation of external goal instructions that must be aligned with the internal representations learned.

Objective 2: Learning representations for generalized planning. The learned planning representations can be used to compute plans from scratch, but the effort of planning can be saved by learning general policies that can then be applied reactively without having to think and plan in each situation. These general policies are more than plans; they are general strategies or skills.

Objective 3: Learning hierarchical representations. General policies may have to be learned from scratch too but often this is not necessary, a new policies can be obtained by combining general policies (skills) that have been already learned. The composition of these skills give rise to hierarchical representations of policies and skills. Even though a large number of works have addressed the problem of learning hierarchical representations, the computational principles that underlie hierarchical representations and problem decomposition are still to be uncovered.

Objective 4: Theory of representations for planning and learning. Learning high-level symbolic representations from unstructured data for acting and planning is the key goal of the project but not the only one. It is not enough indeed to show experimentally that this is possible and can be done in reliable and robust manner: we seek understanding. Namely, we need to understand the reasons that explain these results. Important theoretical aspects are involved like the relation between the expressive power of certain types of neural networks and fragments of first-order logic, between the complexity of problems and the decomposition of problems into subproblems, etc.
We have have made significant progress on all of these objectives as reflected in the publications list, including but not limited to:

1. Learning symbolic first-order action models from the structure of the state space and from parsed images. Currently we are also developing methods for learning to interpret external goal instructions in terms of the internal representations learned, and are getting started with the problem of learning first-order action models directly from raw images

2. Learning general policies without supervision from small problem instances using both combinatorial solvers and deep learning approaches. We have also developed methods for learning general policies from small problem instances using a combination of policy optimization methods and graph neural networks (GNNs), where the learned general policies can be understood logically. Moreover, when the policies do not generalize well, the failures can be understood logically too. Future work involves overcoming the current expressive limitations of GNNs and the corresponding fragment of first-order logic (with two variables).

3. Developing a language for expressing general problem decompositions, called sketches, that can be used to write problem decompositions by hand or to learn them without supervision from data (small instances). More recently, we have built on these notions and methods for learning hierarchical policies from small instances, given the action models, using clear and crisp principles based on the notion of problem width. We also want to be able to learn sketches and hierarchical policies directly from data and from richer action models able to handle continuous time and space, as in robotics applications.

4. Developing a mathematical framework for studying the soundness and completeness of general policies and sketches learned, and the complexity of sketch decompositions based on the notion of width.
We have developed languages for expressing action models, general policies, and hierarchical representations, and have developed methods for learning them without supervision using both combinatorial and deep learning methods. In the case of action models, we took the languages "off-the-shelf" (namely, STRIPS and PDDL) but in the second half of the project we will look at more suitable and expressive alternatives. This is all novel and "beyond the state of the art", yet, the main contribution is a methodology for addressing these problems, step by step, in a systematic way, leaving behind solid concepts, and techniques. These concepts and techniques will be the building blocks for addressing the next round of challenges in the project: dealing with raw data in the form of images for learning action models, general policies, and data, and dealing with continuous action and state spaces. We are well aware that current deep reinforcement learning methods, and in particular, policy optimization methods, manage to deal quite smoothly with both: images and continuous action and state spaces. Yet these methods, by themselves, produce representations that are not meaningful, general, transparent, or reusable. The goals of RLeap is to develop a methodology for obtaining the best of both worlds: learning systems that produce meaningful, flexible, and reusable high-level representations from low level, raw data. In this sense, deep learning and deep reinforcement learning will be part of the solution but the won't be the full solution. In particular, state representations are to be learned over high-level, logical languages, or it should be possible to understand them in this way.

By the end of the project, we would be happy if we are able to learn high-level, meaningful, and transparent action models, general policies, and general problem decompositions (sketches) in two domains: video games like the Atari games, and robotics. The first involves learning high-level representations from raw pixels; the second involves, in addition to cameras, the control of a robot arm that operates in continuous action and state spaces. If we manage to learn in those domains what we currently manage to learn in STRIPS-like settings, the goals of the project will be accomplished. This is the agenda. A lot has been learned, and a lot to be learned.