Skip to main content

Understanding Deep Face Recognition

Periodic Reporting for period 2 - DeepFace (Understanding Deep Face Recognition)

Reporting period: 2018-11-01 to 2020-04-30

The advent of deep learning has brought machines to what is considered a human level of performance earlier than anticipated. However, there are many research questions that are left open. The project was set up to study these questions in the context of face recognition, since it is one of the few domains in machine learning in which millions of classes are routinely learned and which presents a trade-off between subtle inter-identity variations and pronounced intra-identity variations.

The proposal has focused on three domains of research: (i) the study of methods that promote effective transfer learning, (ii) the study of the tradeoffs that govern the optimal utilization of the training data and how the properties of the training data affect the optimal network design, and (iii) the post-transfer utilization of the learned deep networks, where given the representations of a pair of face images, we seek to compare them in the most accurate way. An emphasis was put on basing the work on a theoretical framework .

If the project is successful, new methodologies are to be introduced, which will make transfer learning much more effective and that would enable more accurate deep learning. In addition, grounding our results in concrete theorems, would lead to a more principled way to practice object recognition. Additionally, as detailed below, with the evolving societal focus on ethics in AI, we have become much more concerned with explainable models and fairness.
The proposal discusses three axes of research: transfer learning, optimization of training data, and metric learning. Some work has been achieved in these domains prior to the start date (CVPR 2016 papers on metric learning for matching and cross domain metric learning, CVPR 2016 and ICPR 2016 paper on improving transfer learning with multiverse networks, and IMAIAI 2016 paper on the theory of transfer learning). As the project started, we continued to work on metric learning [14], achieving state of the art results for retrieval based on compact codes, transfer learning [17,25], and very efficient usage of out-of-domain training data [19].

In addition, we shifted our interest to a particular case on transfer learning, where one maps visually between domains in a completely unsupervised way. We propose both practical algorithms [2,11,12,23,24,27], and methods that a grounded, as suggested in the proposal, by theoretical reasoning [20,25]. In these contributions, face datasets serve as the main testbed for the various methods. Unsupervised methods have been further applied elsewhere [13,16,18] and specifically in medical imaging [7,10], where we also applied advanced supervised methods [1]. Generation of images is a major task in unsupervised learning and excellent conditional image generation results were presented in [9]. In addition, we have studied unsupervised learning of facial transformations [26] and worked on lip synchronization in facial videos [4].

Our work on learning representations that are more robust, using the multiverse networks, continued in applications to NLP [4] and has taken a turn into a method to condition training [22] and theoretical analysis of deep neural networks [15]. A current emphasis in our lab is the use of another form of network adaptivity called hypernetworks, with which we obtain state of the art results in 3D reconstruction [8].

Together with the shift of the research community, following societal concerns, toward privacy, fairness, and interpretability, the topic of explainability has become a major interest of ours. Our work sheds light on the reasoning of black boxes classifiers [6,19] and also designs recommendation systems that are explainable [5].

[1] S. Gur, T. Shaharabany, L. Wolf. End to End Trainable Active Contours via Differentiable Rendering. ICLR, 2020.
[2] R. Mokady, S. Benaim, L. Wolf, A. Bermano. Masked Based Unsupervised Content Transfer. ICLR, 2020.
[3] Y. Shalev, L. Wolf. End to End Lip Synchronization with a Temporal AutoEncoder. WACV 2020
[4] I. Malkiel, L. Wolf. Maximal Multiverse Learning for Promoting Cross-Task Generalization of Fine-Tuned Language Models. In submission, 2020.
[5] E. Shulman, L. Wolf. Meta Decision Trees for Explainable Recommendation Systems. AIES, 2020.
[6] W-J. Nam, S. Gur, J. Choi, L. Wolf, S-W. Lee. Relative Attributing Propagation: Interpreting the Comparative Contributions of Individual Units in Deep Neural Networks. AAAI, 2020.
[7] S. Gur, L. Wolf, L. Golgher, P. Blinder. Microvascular Dynamics from 4D Microscopy Using Temporal Segmentation. Pacific Symposium on Biocomputing (PSB) ,2020.
[8] G. Littwin, L. Wolf. Deep Meta Functionals for Shape Representation. ICCV, 2019.
[9] O. Ashual, L. Wolf. Specifying Object Attributes and Relations in Interactive Scene Generation. ICCV, 2019.Best paper honorable mention.
[10] S. Gur, L. Wolf, L. Golgher, P. Blinder. Unsupervised Microvascular Image Segmentation Using an Active Contours Mimicking Neural Network. ICCV, 2019.
[11] T. Cohen, L. Wolf. Bidirectional One-Shot Unsupervised Domain Mapping. ICCV, 2019.
[12] S. Benaim, M. Khaitov, T. Galanti, L. Wolf. Domain Intersection and Domain Difference. ICCV, 2019.
[13] S. Gur, L. Wolf. Single Image Depth Estimation Trained via Depth from Defocus Cues. CVPR, 2019.
[14] B. Klein, L. Wolf. End-to-End Supervised Product Quantization for Image Search and Retrieval. CVPR, 2019. Preliminary preprint arXiv:1711.08589.
[15] E. Littwin, L. Wolf. On the Co
As mentioned above, there is a current emphasis in our lab on using hypernetworks, which is a technique that our lab was maybe the first to use by the name of dynamic convolutions (Klein et al. CVPR 2015). With this technique, we are currently state of the art in 3D reconstruction from a single image (ICCV, 2019) and on action recognition without outside data and in object recognition in less than 50M parameters ( We are also state of the art in active contour image segmentation benchmarks (ICLR 2020).
An illustration of the process of generating a new face image by combining two images
A schematic illustration of our explainable recommendation system
An illustration of the scene generation process