Skip to main content

Robots Understanding Their Actions by Imagining Their Effects

Periodic Reporting for period 2 - IMAGINE (Robots Understanding Their Actions by Imagining Their Effects)

Reporting period: 2018-01-01 to 2019-08-31

"Today's robots are good at executing programmed motions, but they cannot automatically generalize them to novel situations or recover from failures. IMAGINE seeks to enable robots to understand the structure of their environment and how it is affected by its actions. ""Understanding"" here means the ability of the robot to determine the applicability of an action along with parameters to achieve the desired effect, to discern to what extent an action succeeded, and to infer possible causes of failure and generate recovery actions.

The core functional element is a generative model based on an association engine and a physics simulator. ""Understanding"" is given by the robot's ability to predict the effects of its actions. This allows the robot to choose actions and parameters based on their simulated performance, and to monitor their progress by comparing observed to simulated behavior.

This scientific objective is pursued in the context of recycling of electromechanical appliances. Current recycling practices do not automate disassembly, which exposes humans to hazardous materials, encourages illegal disposal, and creates significant threats to environment and health, often in third countries. IMAGINE will develop a TRL-5 prototype that can autonomously disassemble prototypical classes of devices, generate and execute disassembly actions for unseen instances of similar devices, and recover from certain failures.

IMAGINE raises the ability level of robotic systems in core areas of the work programme, including adaptability, manipulation, perception, decisional autonomy, and cognitive ability. Since only one-third of EU e-waste is currently recovered, IMAGINE addresses an area of high economical and ecological impact."
We have implemented a demonstrator targeting the disassembly of computer harddrives. A real scene is analyzed, opportunities for actions are detected, and an action plan is reactively generated using information from simulated and real interaction. Robot actions can be performed by the multi-functional gripper developed by the project, which is not yet integrated into the system.

We have developed a complete pipeline (see the figure) for the reconstruction of scene models from visual perception, covering component recognition, component segmentation, surface reconstruction from stereo, registration of other views with reconstructed surfaces, mesh reconstruction, and real-time voxelization.

We defined action descriptors (ADES) that formalize robot actions in terms of their qualitative and quantitative pre- and postconditions as well as their trainable motor parameters, and implemented ADES for levering, pushing, and unscrewing.

Given raw images or top-down depth mask of the objects, the association engine can detect affordances and predict trajectory-level effects generated on the objects. The robot learns how to tune action control parameters in response to haptic feedback during action execution. It adapts its pushing trajectory based on the error observed between observed and predicted tactile feedback (as illustrated in the figure). For parametric action association, we encode movement trajectories and expected sensory feedback in parametric temporal probabilistic models. The learned sensory feedback models (such as force/torque trajectories) are used to correct a perturbed movement.

We developed a method for estimating support relations between objects by generating a support graph from a depth image (see the figure). It intrinsically handles uncertainty and ambiguity, and give rise to strategies for safe, bimanual manipulation.

Finding complete plans is often not necessary under uncertainty and partial observability. Thus, we hierarchically decompose tasks to reduce the planning horizon and avoid complete plan computations where appropriate. Our system learns effect models of actions with the help of a physics simulator, which it preferentially chooses for risky actions (see the figure). We developed methods for harmonizing symbolic hierarchical task planning with parametric motion planning.

To allow the robot to simulate anticipated action effects internally, we developed methods for turning perceived scene data into simulation models. We use the SOFA simulator and constraint-based solvers to model the interactions between objects. We developed a novel method for simulating active suction cups, allowing the simulation and improvement of control of an important class of grippers.

For parameter estimation for simulation of manipulated objects, we developed a method for active deformation of soft objects by visual servoing (see the figure) which estimates how the object deforms under robot action.

We developed a multi-functional gripper specially designed for disassembly of small objects (see the figure) on the basis of systematic analysis of the assembly actions required. For its operation, we developed a new, trainable motion representation with superior adaptation and extrapolation capabilities (see the figure).
In visual perception, we achieved reliable recognition of object parts using relatively small task-specific training sets, using pre-trained deep neural networks and data augmentation.

Our new Conditional Neural Movement Primitives (CNMP) learn complex temporal multi-modal sensorimotor relations in connection with external parameters and goals, and outperform existing learning-from-demonstration methods in a number of ways. Conditioned on an external goal and on the sensor readings at each time-step, the CNMP generates a sensorimotor trajectory and reacts to unexpected events. CNMP can learn the nonlinear relations between low-dimensional parameter spaces, high-dimensional sensorimotor spaces, and complex motions.

Our innovative planning methods allow efficient planning and reactive replanning under uncertainty. They allow a learning robot to make best use of different environments, including simulated scenes, optimizing criteria such as cost, time, wear, and risk. Our interleaved symbolic task planning and parametric motion planning allows bimanual operations to be planned symbolically while respecting geometric constraints such as object alignment or collision avoidance.

Our novel methods for simulating suction cups allow them to be used in machine-learning contexts and their design and control to be optimized empirically. More generally, learning the deformation of soft objects provides a novel way of obtaining crucial physical parameters for simulation.

The multi-functional gripper comprises multiple innovations, combining a gripper with a separate finger and a built-in tool changer, allowing the execution of diverse actions such as grasping, picking, unscrewing, levering, suction gripping, pushing, etc., all while holding the object in hand. Actions can be trained using novel Via-Point Movement Primitives that generalize and extrapolate better than conventional methods.

In future work, the gripper will be integrated into the demonstrator, allowing it to operate in a closed loop of perception, affordance detection, action planning aided by simulation, and execution by a real robot. Novel methods will be developed that allow simulated experience on a class of devices to be leveraged in disassembly of similar devices by the real system, and that exploit the robot's reasoning capabilities to achieve generalization to wider ranges of tasks and to recover flexibly from unexpected situations.
Scene reconstruction pipeline
Active deformation control for estimating object parameters for physical simulation
Training Via-Point Motion Primitives
Estimating support relations between objects from depth data
The robot pushes the object to a position (bottom-most) closer to the intended one (right-most).
Concept of the IMAGINE multi-functional gripper.
Functional architecture of our algorithm for learning action models from several environments