European Commission logo
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS

Policy Learning of Motor Skills for Humanoid Robots

Periodic Reporting for period 4 - SKILLS4ROBOTS (Policy Learning of Motor Skills for Humanoid Robots)

Período documentado: 2020-01-01 hasta 2021-06-30

Current robots are largely play-back devices that repeat the same pre-programmed trajectory millions of times with little or no sensor-based interventions -- and absolutely no adaptation. To become more useful, future robots will need to perform thousands of tasks usually only a few times. In order to make progress towards this goal, SKILLS4ROBOTS addresses the problem of how skill learning for robotics can be realized at an algorithmic level. In order to address this problem in a useful manner, we have formalized it as a machine learning problem that differs substantially form off-the-shelf machine learning problems with respect to the following issues: i) Active Learning: All data is generated by the robot either with the help of a demonstrator or by trial and error. ii) Little Data: We never have more than a few hundred trial runs. iii) Safety: The algorithm may neither endanger the robot nor the operator.


The objective of SKILLS4ROBOTS is to develop a autonomous skill learning system that enables anthropomorphic robots to acquire and improve a rich set of motor skills. This robot skill learning system will allow scaling of motor abilities up to such robots while overcoming the current limitations of skill learning systems to only few degrees of freedom. To achieve this goal, it decompose complex motor skills into simpler elemental movements – called movement primitives – that serve as building blocks for the higher-level movement strategy and the resulting architecture will be able to address arbitrary, highly complex tasks. Learned primitives will be superimposed, sequenced and blended. The resulting decomposition into building blocks is not only inherent to many motor tasks but also highly scalable and will be exploited by our learning system. Four core objectives underlie our research work, we aim (1) to develop robot learning algorithms that are capable of parsing demonstrated behavior into modular policies consisting of a gating network and many elementary actions, (2) generalize the resulting modular movement policy to maximize its applicability while minimizing the number of modular templates, and (3) efficiently self-improve the modular policy by trial-and-error learning on the real system. (4) The evaluation on a variety of complex motor skills with a powerful humanoid robot is crucial both for gaining insight into faced problems and for validation of the underlying ideas.
The research in SKILLS4ROBOTS from the beginning of the project has aimed at composing complex movements using learned elements. Since the beginning of the project to the end of the period covered by the report, we have aimed at making progress towards deriving multiple learning to better generalize behaviors than contained in any training data set obtained by observation and experimentation. All work focusses both on analytical algorithmic work as well as on real robot experimentation. In this context, we have made progress on these aspects as follows.

(1) Development of robot learning algorithms that are capable of parsing demonstrated behavior into modular policies.

We have been developing and refining novel approaches for efficient modular imitation learning of demonstrated behavior with simultaneous and sequenced motor primitives to initialize our modular control policy by imitation learning. In particular, we have developed a highly successful novel method for automatically extracting the elemental building blocks in form of our probabilistic movement primitives (ProMP) jointly with the hierarchical activation policy out of demonstrated behavior. This method relies on variational expectation maximization in order to first generate an over-segmentation of the demonstrated behavior and a subsequent clustering step that combined the resulting probabilistic movement primitives into more complex behavior. The resulting methods work well for extracting various skills including robot assembly tasks, robot table tennis movements and robot tic-tac-toe.

(2) Generalization of experience to novel situations via motor primitives adaptation.

We have been developing studies on the generalization of previously observed behavior to new situations. To accomplish this goal, we have developed new approaches that have allowed our the movement primitives to become more easily adaptable to a new situation. Such modulation of the primitives is accomplished through highly task dependent meta-parameters.

(3) Efficient model-based reinforcement learning of all elements of the modular policy.

The modular control strategy initialized by imitation learning can never exceed the quality of the demonstrations provided by the teacher and potentially suffers from lacking the teacher's prior knowledge. To improve an existing strategy, novel reinforcement learning approaches are needed. This problem is known to be among the hardest problems in machine learning, and it is even harder within robot skill learning. When starting the work from the proposal, we initially encountered unexpected difficulties due to the maximum likelihood nature of the proposed expectation-maximization-based reinforcement learning approaches. This initial problem has led us to a major discovery on the use if on limiting information loss in policy updates using an f-Divergence regularizer. This re-formulation not only explains previous approaches from the literature but most importantly, it has created a toolbox for automatically generating new reinforcement learning algorithms.

(4) Validation on high-dimensional robots.

We have consistently evaluated our learning system in both a simulated and a real robot motor skill learning scenarios on anthropomorphic robots of different morphology and consistently high dimensionality. On the real robots, we have worked on a large number of tasks including robot table tennis strokes, robot assembly tasks, robot tic-tac-toe, robot obstacle avoidance within human-robot interaction, robot catching, robot throwing and robot juggling. This validation leads to particularly high impact of the presented approaches.
The progress beyond the state of the art includes: (i) Robot learning algorithms that are capable of parsing demonstrated behavior into the underlying elementary actions and that enables the identification of the executed motor primitives. To complete this task and learn a modular policies that composes with such elementary actions or probabilistic movement primitives, a gating network policy is required. (ii) Novel algorithms for learning movement patterns that yield probabilistic grammars of behavior which describe the activation patterns of demonstrated behavior and yield a behavior activation policy where the symbols of that policy are the previously learned motor primitives. (iii) Novel insight on designing reinforcement learning approaches is of crucial importance in order to design a reinforcement learning architecture that is able to learn on all levels of the modular control policy and is also data-efficient enough to be applicable on the real robot. (iv) Impressive validation, particularly within high-speed robotics.
Simplest Task: Robot passing a Flower