Periodic Reporting for period 4 - TERI (Teaching Robots Interactively)
Reporting period: 2023-08-01 to 2024-01-31
Current robot learning approaches focus either on imitation learning (mimicking the teacher’s movement) or on reinforcement learning (self-improvement by trial and error). Learning even moderately complex tasks in this way still requires infeasibly many iterations or task-specific prior knowledge that needs to be programmed in the robot. To render robot learning fast, effective, and efficient, we proposed to incorporate intermittent robot-teacher interaction, which so far has been largely ignored in robot learning although it is a prominent feature in human learning. This project has delivered a completely new and better approach: robot learning no longer relies on initial demonstrations only, but it effectively uses additional user feedback to continuously optimize the task performance. It enables the user to directly perceive and correct undesirable behavior and to quickly guide the robot toward the target behavior. The three-fold challenge of this project were: developing theoretically sound techniques which are at the same time intuitive for the user and efficient for real-world applications.
The novel framework has been validated with generic real-world robotic force-interaction tasks related to handling and (dis)assembly. The potential of the newly developed teaching framework has been demonstrated with challenging bi-manual tasks and a final study evaluating how well novice human operators can teach novel tasks to a robot.
Another focus was on the use of interactive feedback to reduce the burden of the user in supplying demonstrations and trying to extract as much information about the intention of the user in the provided inputs as possible to reduce any possible ambiguous interpretation from the side of the learning algorithm. We investigated ambiguities in in three different scenarios 1) learning complex trajectory when the goal is dependent on different reference frames (attached to the objects in the environment, for example), in different segments of the movement 2) teaching complex movement where both trajectories and stiffness properties need to be learned 3) learning controllers that need to rely on different sensor modalities in different situations. We proposed methods for using priors and interactive feedback for solving the ambiguity in the inferring choice without any explicit programming and using a reduced number of complete demonstrations. We have evaluated different feedback modalities ranging from corrections to ratings as well as different interaction modalities ranging from the robot asking for advise to the human teacher being the driving force, also combining multiple modalities in one joint framework.
The developed approaches were benchmarked against state-of-the-art approaches, evaluated with human volunteers, and we demonstrated them in various tasks on real robot arms and mobile robots.
The project has resulted in numerous novel methods published in scientific papers. We also published an extensive survey on interactive imitation learning. The ideas have been disseminated by invited talks at conferences, workshops, companies, and universities, by organizing scientific workshops on the topic, as well as applying the methods in competitions and more application-oriented projects.