Understanding human action from unstructured 3D point clouds using deep learning methods

Project Information

3DInAction

Grant agreement ID: 893465

Project website

DOI

10.3030/893465

Project closed

EC signature date 2 April 2020

Start date 1 January 2021

End date 31 December 2023

Funded under

EXCELLENT SCIENCE - Marie Skłodowska-Curie Actions

Total cost

€ 276 205,44

EU contribution

€ 276 205,44

276 205,44

Coordinated by

TECHNION RESEARCH AND DEVELOPMENT FOUNDATION LTD
Israel

Periodic Reporting for period 2 - 3DInAction (Understanding human action from unstructured 3D point clouds using deep learning methods)

Reporting period: 2023-01-01 to 2023-12-31

In this research, we address the problem of human action recognition and understanding, which is crucial in many autonomous robotic systems and other engineering problems. These systems require the ability to accurately recognize and forecast human actions, which is typically achieved using video data and deep learning methods. However, this research aims to improve upon these methods by using 3D data, specifically 3D point clouds, to improve the accuracy and robustness of human action recognition. The use of 3D data is important because it allows for a more accurate representation of human actions in the real world, as it takes into account the spatial changes that can occur in 3D environments. The overall objectives of this research are to devise novel algorithms for 3D human action recognition and forecasting using deep learning, to create an annotated dataset for training and testing these algorithms, and to suggest new methods for tackling the challenges of 3D human action recognition. The resulting algorithms have the potential to be used in a wide range of scientific and engineering applications, such as human-robot interaction.

Research:
During the first period of this project, we proposed seven novel methods for point cloud processing and action understanding. Three of these have been accepted and published in CVPR, IROS, and Computer and Graphics, another three have been submitted and are currently under review and one is in the preparation stage for submission in March to ICCV.
Details of each paper:
* GoferBot - we proposed, constructed, and evaluated a human-robot collaborative system for the task of assembling furniture, published and presented in IROS. In this system, a human was tasked with assembling a piece of IKEA furniture, A robot arm, equipped with a Kinect camera had to infer the current human action, predict their next action, and retrieve the next piece of assembly. We conducted a thorough evaluation of the system's individual components and the system as a whole. Surprisingly, despite performing objectively faster than command-based alternatives we found that humans perceived the collaboration to be less fluent.
* DiGS - we proposed a divergence-based novel Neural Implicit representation for point clouds. Published in CVPR 2022. In this project, we take point clouds as input and train a neural network to represent a signed distance function where the zero level set is the surface on which the points were sampled on. We demonstrated improved performance on the existing state-of-the-art methods, particularly for unoriented point clouds.
* CloudWalker - we proposed a novel method for representing point clouds using random walks. Published in Computers & Graphics 2022.
* IKEA Assembly in the wild (IAW) dataset - We collected a dataset of YouTube IKEA assembly videos and annotated them with an alignment of instructional manual diagrams. We proposed a novel method to solve the alignment problem using contrastive learning. Submitted, is under review.
* GraVoS - we propose a novel method to improve point cloud detection methods using gradient-based selection. Submitted, is under review.
* OG surface reconstruction - we propose a novel method for unoriented point cloud surface reconstruction using Octree data structure guidance. Submitted, is under review.
* IKEA Ego dataset - We collected and annotated a dataset of human assembly actions using a Hololens 2.0 headset from an ego-view perspective. The dataset includes point clouds, RGB, depth, view and hand-tracking information alongside action label annotation. The paper will be submitted in March to ICCV.

Exploitation: I have been contacted by IKEA digital lab representatives and we had several meetings to discuss possible collaborations. Nothing concrete has materialized yet but I expect it to be viable in the near future as our research interest aligns well with their work.
Furthermore, I contacted Mobileye and met with some of their representatives to present our work. We discussed possibilities of collaboration that have not materialized yet. Additionally, several start-up companies have expressed interest in my work but we are still in the very early stage of communications and I do not expect it to materialize in the future into a collaboration.

Dissemination and outreach: I have co-organized the 2021 Robotic vision Summer school and 2022 Robotic vision Summer school in Australia. Additionally, I have created the Talking Papers Podcast where I host authors of seminal papers in the field of computer vision and machine learning. Furthermore, I have presented our work at CVPR 2022, and Israel Vision day 2021 and have attended ECCV 2022 and Israel Vision day 2022 in person.

The objective of this project is to develop a comprehensive framework for recognizing human actions in 3D environments. This is a challenging task, as current action recognition methods do not take full advantage of 3D data. Through this research, we have generated new knowledge including a dataset, advanced algorithms, and reliable code. The outcomes of this interdisciplinary study in areas such as 3D computer vision and geometry processing are expected to have a positive impact on multiple research fields and industries. Within my community, three papers have already been published and another four are expected to be published in the coming year. All manuscripts have been uploaded into arXiv to be freely available and all links, code and resources are available in my personal website. Following the publication of our IKEA Ego dataset and the IAW dataset we plan to organize a workshop at one of the leading conferences (ICCV, CVPR, ECCV, WACV, etc) to promote more research work in this domain. Furthermore, we intend to propose novel methods to address the task of human action understanding from 3D point cloud data.
The socio-economic and wider societal implications of the project are expected to be substantial, if and when integrated into cutting-edge vision systems, for example in autonomous cars or wearable headsets. The knowledge generated from this research provides the foundation for these applications to flurish in a dynamic human environment.

pipeline.png

teaser.png

Periodic Reporting for period 2 - 3DInAction (Understanding human action from unstructured 3D point clouds using deep learning methods)

Share this page Share this page on social networks

Download Download the content of the page