Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS

Understanding Deep Face Recognition

Periodic Reporting for period 4 - DeepFace (Understanding Deep Face Recognition)

Reporting period: 2021-11-01 to 2023-04-30

The advent of deep learning has brought machines to what is considered a human level of performance earlier than anticipated. However, there are many research questions that are left open. The project was set up to study these questions in the context of face recognition since it is one of the few domains in machine learning in which millions of classes are routinely learned and which presents a trade-off between subtle inter-identity variations and pronounced intra-identity variations.
The proposal has focused on three domains of research. The first is the study of methods that promote effective transfer learning. The second domain is the study of the tradeoffs that govern the optimal utilization of the training data and how the properties of the training data affect the optimal network design. The third domain is the post-transfer utilization of the learned deep networks, where given the representations of a pair of face images, we seek to compare them most accurately.
An emphasis on theoretical reasoning supports the developed methods with a mathematical framework that would both justify their usage as well as provide concrete guidelines for using them.
The methods developed in the projects have made transfer learning much more effective and enabled more accurate deep metric learning. The theory-based approach, in which many results in concrete theorems, lead to principled ways to practice deep learning. Additionally, due to the evolving societal focus on ethics in AI, many efforts have been developed to explainable AI models and fairness, with significant impact, especially for emerging models such as transformers.
The proposal discusses three axes of research: transfer learning, optimization of training data, and metric learning. Some work has been achieved in these domains prior to the start date. As the project started, we continued to work on metric learning, achieving state-of-the-art results for retrieval based on compact codes, transfer learning, and very efficient usage of out-of-domain training data.

In addition, we shifted our interest to a particular case on transfer learning, where one maps visually between domains in a completely unsupervised way. We propose both practical algorithms and methods that a grounded, as suggested in the proposal, by theoretical reasoning. In these contributions, face datasets serve as the main testbed for the various methods.

Unsupervised methods have been further applied elsewhere and specifically in medical imaging, where we also applied advanced supervised methods. The generation of images is a major task in unsupervised learning and excellent conditional image generation results were presented by the team. In addition, we have studied unsupervised learning of facial transformations and worked on lip synchronization in facial videos.

Our work on learning representations that are more robust, using the multiverse networks, continued in applications to NLP and has taken a turn into a method to condition training and theoretical analysis of deep neural networks. Our interest evolved toward the use of another form of network adaptivity called hypernetworks, with which we obtain state-of-the-art results in 3D reconstruction.

Together with the shift of the research community, following societal concerns, toward privacy, fairness, and interpretability, the topic of explainability has become a major interest of ours. Our work sheds light on the reasoning of black-box classifiers and also designs recommendation systems that are explainable. A major emphasis was put on employing explainable AI systems for better classification and image generation. Another, orthogonal, line of effort in exposing the inner workings of AI models was dedicated to describing images with text.

In the realm of securing face recognition systems by studying their vulnerabilities, we obtained a paper award at the Face and Gesture 2021 conference for a work describing a dictionary attack that is based on employing StyleGAN and then extended this work to 3D. We also worked on semantic face editing, and on representing real-world images in a latent space of StyleGAN. This last work has won the best paper award at SCIA 2023.
As recent work demonstrates, image generation is crucial for both training and attaching face recognition systems. We also explored generative algorithms for other vision problems were also for audio and other non-vision modalities.
An illustration of the scene generation process
A schematic illustration of our explainable recommendation system
An illustration of the process of generating a new face image by combining two images