CORDIS - EU research results

Convergent Human Learning for Robot Skill Generation

Final Report Summary - CONVERGE (Convergent Human Learning for Robot Skill Generation)

Robot programming is one of the bottlenecks for moving robots from factories to our daily lives. When a new skill is desired, expert knowledge is needed to implement this on the target robot. There is a growing effort for developing systems that can learn by themselves or by observing a demonstrator. When direct measurement of the motor variables of the demonstrator (e.g. joint angles) is available, the problem can be solved by designing a mapping from the observed values to the target robot actuators. However, due to kinematic and dynamic differences between the observer and the demonstrator this is not always trivial. Another approach is kinesthetic demonstration i.e. actively moving the robot through via points to reach a desired behavior, which however is not applicable to tasks with non-negligible dynamics. Recently, a robot skill generation framework was proposed that circumvents these limitations by relying on the sensorimotor learning capacity of the central nervous system (Oztop, Lin et al. 2006, Oztop, Lin et al. 2007, Babic, Hale et al. 2011, Moore and Oztop 2012) . In this framework, the operator is put in the control loop of a robotic system where (s)he controls the robot in real-time. The operator then ‘learns’ to make the robot perform a given task. After the human becomes expert in this task, the signals coming in and leaving out of the robot are used to construct an autonomous controller. The key point of this framework is that it takes away the work from the cognitive system of an expert and puts it on a layperson’s sensorimotor system.

Converge project ( aims to improve the efficacy of this framework by allowing the human and robot learn simultaneously, and work together as a team. For this, several human-in-the-loop learning setups has been developed, and two main research directions are pursued. Simultaneous learning for autonomy (1) and human-robot shared control (2) to surpass the performances that of the individual agents alone. For the former, simultaneous and sequential learning experiments with a cart-pole system for swing up and balance task have been conducted. The results indicate that within the simultaneous learning framework, convergent learning is possible with dynamic control sharing, and the obtained autonomous controllers perform better than the usual sequential human in the loop learning (Zamani and Oztop 2015). Although for naïve subjects, initially simultaneous learning feels harder; they still can generate policies with improved performance. With longer practice, it is possible to exploit the simultaneous learning framework to generate successful autonomous policies that can do the full ‘swing-up and pole balance’ task. The same framework is also applied to the so called ‘ball and beam’ task, where the goal is to speed up a ball placed on a rail (by raising or lowering one end) such that it never hits the either end. The results show that a smooth shift from full human guidance to shared control and finally to autonomous control can be achieved effectively. Similar transition from full teleoperation to full autonomy is also demonstrated on the ‘ball balancing’ task (which is described below) executed through an anthropomorphic robotic arm with dynamic control sharing.

In the second focus area of shared control, almost exclusively anthropomorphic robots are used. In particular, ‘ball balancing’ is studied in detail for its intuitive nature. The ball balancing task requires the movement of the robot hand so as to balance a ball placed on the tray attached to the end effector of the robot. First an autonomous robot policy is synthesized using the human in the loop learning (sequential learning) framework, and kept constant to study shared control, i.e. the human adaptation and the performance of the human-robot system as a whole. To address a richer class of problems the ball balancing task is defined as balancing a ball at a given location on the table. Therefore, the correct policy is goal dependent, and thus an effective shared control requires this goal to be shared between the human and the robot. We have studied the case where the robot is uninformed of the goal but has to infer it by observing its partner’s, i.e. human operator’s movements. In the experiments conducted, the robot is given a simple human intention estimation mechanism, and a constant control sharing. The results indicate that human adaptation creates a shared control system which emergently exploits the best parts of the human and the robot control. To be concrete, when the robot and human are coupled, a symbiotic system is formed that can balance the ball faster than the human alone and with a higher accuracy than the robot alone (Amirshirzad, Kaya et al. 2016).

When a human operator is involved in a shared control scenario such as in the ball balancing task, (s)he needs to interact with a non-stationary system that tries to predict the human goal and interfere with the control based on its prediction. So it is not clear how long it will take the human to adapt to the system and achieve a high performance. To investigate the human adaptation, extensive experiments has been performed using the ball balancing task. One group tele-operated a robot that that does not interfere with the control (human control condition); the other group controlled a robot that does intention inference and contributes to the net control command as described above (shared control condition). The results indicate that human learning proceeds faster in the shared control condition measured with the performance criteria of task completion time, length of the ball trajectory, the positional error of the ball. This is interesting, because even initially subjects had to deal with a non-stationary partner, they soon learn to exploit the robot partner to achieve a high overall performance.

One effective shared control mechanism that can be used to generate autonomous control policies is the ‘heterogonous control sharing’, where the control channels of the robot and the human are chosen to be orthogonal. In the previous human-robot control settings, the robot and the human have commanded the same control channel, so a ‘weight’ or arbitration between the human and robot has become necessary. However, such a weighting mechanisms is needed in heterogeneous control sharing making it a good choice for robot skill synthesis when such control split can be made. To realize a heterogeneous control sharing system, the so called ‘ball swapping task’ (Moore and Oztop 2012) is chosen which requires a robot hand with dexterous fingers to swap the position of a pair of balls held by the robot hand. The robot finger movements are given the basic autonomy of following an open loop sine wave with constant phase difference between the fingers. This is not at all sufficient to swap the balls. Human entered the control loop by commanding the robot arm (that has the robot hand mounted as its end-effector) by generating the desired position and orientation commands for the hand. Although initially it was not clear whether a human operator can learn to control the robot arm in response to finger and ball states to achieve ball swapping, in a few days an arm policy is discovered by the human operator that facilitates the ball swap, which then is used to synthesize a fully autonomous ball swapping performance.

Overall, in this project several aspects of human in the loop robot control systems are studied. In particular, simultaneous learning of the robot and the human for the aim of autonomous policy synthesis and shared control for obtaining a synergistic performance to surpass what can be achieved by the individual agents alone has been addressed. It is becoming more and more clear that for a robot enabled society, equipping robots with mechanisms to enable them to symbiotically operate with humans is at least as critical as robot autonomy and dexterity. This work made a step towards developing technologies that targets not only autonomous skill generation but also shared control mechanisms that will synergistically combine the skills of robots and humans for better performance. The study of human sensorimotor adaptation for robot control and robot learning is a rich research venue that may potentially lead to paradigm shifts in how robots are programmed, tested and deployed in everyday life. In particular, we may see robotic trainers as a new employment area where employees use their sensorimotor adaptation capacity to train robots. In fact, expert crane operators who must tune their sensorimotor skills to perform precisely and robustly are always on the high demand. Likewise, the need for master remote flight operators for the unmanned aerial vehicles are on the rise. It is not difficult to imagine that these expert operators of today will be transformed into the ‘robot trainers’ of the near future as the socio-economic impact of the research direction that the Converge project has contributed.