The research in SKILLS4ROBOTS from the beginning of the project has aimed at composing complex movements using learned elements. Since the beginning of the project to the end of the period covered by the report, we have aimed at making progress towards deriving multiple learning to better generalize behaviors than contained in any training data set obtained by observation and experimentation. All work focusses both on analytical algorithmic work as well as on real robot experimentation. In this context, we have made progress on these aspects as follows.
(1) Development of robot learning algorithms that are capable of parsing demonstrated behavior into modular policies.
We have been developing and refining novel approaches for efficient modular imitation learning of demonstrated behavior with simultaneous and sequenced motor primitives to initialize our modular control policy by imitation learning. In particular, we have developed a highly successful novel method for automatically extracting the elemental building blocks in form of our probabilistic movement primitives (ProMP) jointly with the hierarchical activation policy out of demonstrated behavior. This method relies on variational expectation maximization in order to first generate an over-segmentation of the demonstrated behavior and a subsequent clustering step that combined the resulting probabilistic movement primitives into more complex behavior. The resulting methods work well for extracting various skills including robot assembly tasks, robot table tennis movements and robot tic-tac-toe.
(2) Generalization of experience to novel situations via motor primitives adaptation.
We have been developing studies on the generalization of previously observed behavior to new situations. To accomplish this goal, we have developed new approaches that have allowed our the movement primitives to become more easily adaptable to a new situation. Such modulation of the primitives is accomplished through highly task dependent meta-parameters.
(3) Efficient model-based reinforcement learning of all elements of the modular policy.
The modular control strategy initialized by imitation learning can never exceed the quality of the demonstrations provided by the teacher and potentially suffers from lacking the teacher's prior knowledge. To improve an existing strategy, novel reinforcement learning approaches are needed. This problem is known to be among the hardest problems in machine learning, and it is even harder within robot skill learning. When starting the work from the proposal, we initially encountered unexpected difficulties due to the maximum likelihood nature of the proposed expectation-maximization-based reinforcement learning approaches. This initial problem has led us to a major discovery on the use if on limiting information loss in policy updates using an f-Divergence regularizer. This re-formulation not only explains previous approaches from the literature but most importantly, it has created a toolbox for automatically generating new reinforcement learning algorithms.
(4) Validation on high-dimensional robots.
We have consistently evaluated our learning system in both a simulated and a real robot motor skill learning scenarios on anthropomorphic robots of different morphology and consistently high dimensionality. On the real robots, we have worked on a large number of tasks including robot table tennis strokes, robot assembly tasks, robot tic-tac-toe, robot obstacle avoidance within human-robot interaction, robot catching, robot throwing and robot juggling. This validation leads to particularly high impact of the presented approaches.