Periodic Reporting for period 4 - SKILLS4ROBOTS (Policy Learning of Motor Skills for Humanoid Robots)
Período documentado: 2020-01-01 hasta 2021-06-30
The objective of SKILLS4ROBOTS is to develop a autonomous skill learning system that enables anthropomorphic robots to acquire and improve a rich set of motor skills. This robot skill learning system will allow scaling of motor abilities up to such robots while overcoming the current limitations of skill learning systems to only few degrees of freedom. To achieve this goal, it decompose complex motor skills into simpler elemental movements – called movement primitives – that serve as building blocks for the higher-level movement strategy and the resulting architecture will be able to address arbitrary, highly complex tasks. Learned primitives will be superimposed, sequenced and blended. The resulting decomposition into building blocks is not only inherent to many motor tasks but also highly scalable and will be exploited by our learning system. Four core objectives underlie our research work, we aim (1) to develop robot learning algorithms that are capable of parsing demonstrated behavior into modular policies consisting of a gating network and many elementary actions, (2) generalize the resulting modular movement policy to maximize its applicability while minimizing the number of modular templates, and (3) efficiently self-improve the modular policy by trial-and-error learning on the real system. (4) The evaluation on a variety of complex motor skills with a powerful humanoid robot is crucial both for gaining insight into faced problems and for validation of the underlying ideas.
(1) Development of robot learning algorithms that are capable of parsing demonstrated behavior into modular policies.
We have been developing and refining novel approaches for efficient modular imitation learning of demonstrated behavior with simultaneous and sequenced motor primitives to initialize our modular control policy by imitation learning. In particular, we have developed a highly successful novel method for automatically extracting the elemental building blocks in form of our probabilistic movement primitives (ProMP) jointly with the hierarchical activation policy out of demonstrated behavior. This method relies on variational expectation maximization in order to first generate an over-segmentation of the demonstrated behavior and a subsequent clustering step that combined the resulting probabilistic movement primitives into more complex behavior. The resulting methods work well for extracting various skills including robot assembly tasks, robot table tennis movements and robot tic-tac-toe.
(2) Generalization of experience to novel situations via motor primitives adaptation.
We have been developing studies on the generalization of previously observed behavior to new situations. To accomplish this goal, we have developed new approaches that have allowed our the movement primitives to become more easily adaptable to a new situation. Such modulation of the primitives is accomplished through highly task dependent meta-parameters.
(3) Efficient model-based reinforcement learning of all elements of the modular policy.
The modular control strategy initialized by imitation learning can never exceed the quality of the demonstrations provided by the teacher and potentially suffers from lacking the teacher's prior knowledge. To improve an existing strategy, novel reinforcement learning approaches are needed. This problem is known to be among the hardest problems in machine learning, and it is even harder within robot skill learning. When starting the work from the proposal, we initially encountered unexpected difficulties due to the maximum likelihood nature of the proposed expectation-maximization-based reinforcement learning approaches. This initial problem has led us to a major discovery on the use if on limiting information loss in policy updates using an f-Divergence regularizer. This re-formulation not only explains previous approaches from the literature but most importantly, it has created a toolbox for automatically generating new reinforcement learning algorithms.
(4) Validation on high-dimensional robots.
We have consistently evaluated our learning system in both a simulated and a real robot motor skill learning scenarios on anthropomorphic robots of different morphology and consistently high dimensionality. On the real robots, we have worked on a large number of tasks including robot table tennis strokes, robot assembly tasks, robot tic-tac-toe, robot obstacle avoidance within human-robot interaction, robot catching, robot throwing and robot juggling. This validation leads to particularly high impact of the presented approaches.