Bimanual Manipulation of Rigid and Deformable Objects

Periodic Reporting for period 2 - BIRD (Bimanual Manipulation of Rigid and Deformable Objects)

Okres sprawozdawczy: 2022-03-01 do 2023-08-31

All day long, our fingers touch, grasp and move objects in various media such as air, water, oil. We do this almost effortlessly - it feels like we do not spend time planning and reflecting over what our hands and fingers do or how the continuous integration of various sensory modalities such as vision, touch, proprioception, hearing help us to outperform any other biological system in the variety of the interaction tasks that we can execute. Largely overlooked, and perhaps most fascinating is the ease with which we perform these interactions resulting in a belief that these are also easy to accomplish in artificial systems such as robots. However, there are still no robots that can easily hand-wash dishes, button a shirt or peel a potato. Our claim is that this is fundamentally a problem of appropriate representation or parameterization.
When interacting with objects, the robot needs to consider geometric, topological, and physical properties of objects. This can be done either explicitly, by modeling and representing these properties, or implicitly, by learning them from data. The main scientific objective of this project is to create new informative and compact representations of deformable objects that incorporate both analytical and learning-based approaches and encode geometric, topological, and physical information about the robot, the object, and the environment. We will do this in the context of challenging multimodal, bimanual object interaction tasks. The focus will be on physical interaction with deformable objects using multimodal feedback. To meet these objectives, we will use theoretical and computational methods together with rigorous experimental evaluation to model skilled sensorimotor behavior in bimanual robot systems.

In Period 1, we have conducted research along planned workpackages and subtasks, namely T1.1 T1.2 and T3.1. Regarding T1.1. Representations: definition, modelling, efficiency, an important objective was the research on designing compact low-dimensional representations for objects and actions that incorporate their geometric, topological, and physical properties that are relevant for specific tasks, to enable efficient control and planning strategies to perform these. To this end, we have developed a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces. Planning is performed in a low-dimensional latent state space that embeds images. We define and implement a Latent Space Roadmap which is a graph based structure that globally captures the latent system dynamics. We show the effectiveness of the method on a simulated box stacking task as well as a T-shirt folding task performed with a real robot, something relevant for T1.3. too. Furthermore, we have addressed additional representation learning problem for manipulation of deformable objects. In particular, we consider graph-based representations of deformable objects which arise naturally from their point-cloud representation. Through manipulation, we learn to coarsen this graph into a simpler representation which still captures the necessary dynamics of the object. Our model consists of a Cluster Assignment Model which takes the initial graph and coarsens it, a Coarsened Dynamics Model that approximates the dynamics of the coarsened graph and a Forward Prediction Model which predicts the next state. This is also a part of the work planned in T2.1 perceiving humans, scenes and objects. Finally, regading T3.1. Benchmarks for cloth manipulation we have published a manuscript in collaboration with colleagues from Institut de Robòtica i Informática Industrial, CSIC-UPC, Barcelona, Spain.

The scientific work continued as planned in P2 too. In relation to WP1, we focused on exploring learning representations of data that are equivariant with respect to a symmetry group G. Equivariant representations preserve the geometry of the data space in its latent space, leading to an isomorphism between data space and latent space. We achieve this by constructing a latent space which respects transformations given by known group actions in the data-space. In this manner, we achieve representations that are disentangled with respect to pose (group action) and class (orbit). Furthermore, we looked into meta-learning, thus having an objectives of learning-to-learn. Meta-Learning is a field of study concerned with learning-to-learn. We have demonstrated the work where models are trained in a multi-task setting, with the objective of learning novel tasks using only a small dataset (denoted as support-set in few-shot learning). The size of the support-set naturally induces variance in the adapted parameters, leading to essentially inefficient identification of model parameters and consequently learning a sub-optimal model for the specific task. In the conducted work, we propose to reduce this variance through an inverse variance weighting scheme, where the variance is the model uncertainty induced from each point in the support set. The model uncertainty, in turn, is found through the Laplace approximation. Another aspect that improves meta-learning performance is what representations are learnt that are common across the different tasks. We show that representations correspond to basis functions can be efficiently learnt in a multi-task setting. This enables to uncover a shared set of basis functions that essentially forms the governing equation of the data, independent of parameters. Our setting is related to that of system identification methods such as SINDy, which in contrast to ours consider only a single task setting. As such, they attempt to identify both parameters and basis functions simultaneously, which we show is prone to over fitting when data is small and inefficient in a multi-task setting.

In the original plan we structured the practical work along an important and challenging robotic manipulation task of cloth manipulation. We envisioned three levels of difficulty: i) Spreading a tablecloth, ii) Folding a towel, and iii) Partial dressing. In already published work and the uploaded publications we have successfully addressed all three. We have learned about the important challenges related to the tasks and we continue to use them to demonstrate our theoretical developments on them using three different robot platforms: YuMI, Baxter and Franka Emika arms.

The main focus for the first period was to develop methods for successful encoding of complex robotic manipulations tasks and work on theoretical methods for their evaluations. We both developed a data-driven visual-action planning framework for folding tasks and a Geometric Component Analysis (GeomCA) algorithm that evaluates representation spaces based on their geometric and topological properties. GeomCA can be applied to representations of any dimension, independently of the model that generated them. We demonstrated its applicability by analyzing representations obtained from a variety of scenarios, such as contrastive learning models, generative models and supervised learning models.

After Period 2, we will continue to work on teh representation learning for complex dynamical systems in the context of deformable objects.

Simulation of tasks involving complex objects

Periodic Reporting for period 2 - BIRD (Bimanual Manipulation of Rigid and Deformable Objects)

Udostępnij tę stronę

Pobierz