Final Report Summary - FER IN THE WILD (Facial Expression Recognition in the Wild)
------------
The goal of this project was to develop a fully-automatic facial expression recognition system capable of handling uncontrolled real-world settings. The most crucial step in the pipeline was to develop a fully-automatic face detection and facial landmark point tracking system capable of handling faces in the wild. For this, we proposed several methodologies each one capable of real-time performance.
Firstly, we proposed a novel non-rigid face alignment framework that rely on the texture model based on the response maps generated via discriminantly trained filters (i.e. patch-experts) and shows state-of-the-art performance under uncontrolled and natural settings [1]. Within this framework, we proposed a novel part-based discriminative regression based approach referred to as Discriminative Fitting of Response Maps (DFRM) method [1,2], and a novel holistic generative approach referred to as the Generative Fitting of Response Maps (GRMF) method [1].
Secondly, to further exploit the part-based approach, we proposed multiple extensions to the Supervised Descent Methodology. We formulated an incremental training methodology which deals with the problem of updating a discriminative facial deformable models automatically (a problem that has not been thoroughly studied in the literature) and that can automatically tailor themselves to the subject being tracked and the imaging conditions using image sequences, and hence, become person-specific over time [3].
And finally, building upon the real-time face tracking system developed using the above methodologies, a facial expression recognition system was implemented [4,5] that relied on aligning a 3D facial shape model to the estimated facial landmark points.
All the code developed during this project has been made available online.
Contribution, Originality and Impact
---------------------------------------------------
The research conducted as a part of this project have made several novel contributions.
Firstly, we proposed a face alignment framework that relied on the texture model generated by the responses of discriminatively trained part-based filters. Unlike standard texture models built from pixel intensities or responses generated by generic filters (e.g. Gabor), our framework had two important advantages. Firstly, by virtue of discriminative training, invariance to external variations (like identity, pose, illumination and expression) was achieved. Secondly, we showed that the responses generated by discriminatively trained filters (or patch-experts) were sparse and can be modeled using a very small number of parameters. As a result, the optimization methods based on the proposed texture model could better cope with unseen variations. We illustrated this point by formulating both part-based [1,2] and holistic approaches [1] for generic face alignment and showed that our framework outperforms the state-of-the-art on multiple ”wild” databases.
Secondly, we also proposed robust feature fusion based patch experts which showed impressive performance on the faces in the wild [6]. We proposed to fuse the Histogram of Oriented Gradients (HOG) features and the Image Gradient Orientation (IGO) features for training these robust patch experts. We showed that the response maps generated via HOG based patch experts are highly robust for landmark localization, whereas, the response maps generated via IGO based patch experts show capacity for precision in landmark localization. This heterogeneous nature of HOG and IGO based patch experts make them ideal candidates for feature fusion. For this, we investigated various fusion methodologies and show that the proposed Fusion by Concatenation of Correlated Features method works the best. In addition, we also explored the 3D Constrained Local Model framework based on the patch-experts trained on the histogram-based 3D facial geometry features [7].
Thirdly, we addressed a very important problem of developing adaptive facial tracking system capable of automatically tailoring themselves to the subject being tracked and the imaging conditions [3]. The development of facial databases with an abundance of annotated facial data captured under unconstrained ’in-the-wild’ conditions have made discriminative facial deformable models the de facto choice for generic facial landmark localization. Even though very good performance for the facial landmark localization was shown by many recently proposed discriminative techniques, when it comes to the applications that require excellent accuracy, such as facial behaviour analysis and facial motion capture, the semi-automatic person-specific or even tedious manual tracking was still the preferred choice. One way to construct a person-specific model automatically was through incremental updating of the generic model. In this project, we dealt with the problem of updating a discriminative facial deformable model, a problem that has not been thoroughly studied in the literature. In particular, we studied for the first time, to the best of our knowledge, the strategies to update a discriminative model that is trained by a cascade of regressors. We proposed very efficient strategies to update the model and we showed that it is possible to automatically construct robust discriminative person and imaging condition specific models ’in-the-wild’ that outperform state-of-the-art generic face alignment strategies. In particular, we exploited the fact that the cascade of regressors was trained using the Monte-Carlo sampling methodologies and presented a very efficient methodology which can incrementally update all linear regressors in a cascade in parallel. We demonstrate that the proposed incremental methods for deformable model alignment: (1) Were capable of adding new training samples and updating the model, without re-training from scratch, thereby, constantly increasing robustness of the generic model; (2) Were capable of automatically tailoring themselves to the subject being tracked and the imaging conditions using image sequences, and hence, become person-specific over time.
In addition to this, throughout the project, a special emphasis was placed on developing ready-to-use real-time softwares [3,4,5] for face detection, face and eyes landmark point detection and tracking, 3D head-pose estimation which could benefit the research community. All these softwares can be downloaded from
https://sites.google.com/site/akshayasthana/codes(si apre in una nuova finestra)
https://sites.google.com/site/chehrahome(si apre in una nuova finestra)
As of July 2014, these softwares have been downloaded over 8000 times by the students, researchers and related practitioner. The face and eyes tracking software CHEHRA developed during the course of this project is considered the current state-of-the-art. Moreover, the findings of the project has further application to the general problem of non-rigid object registration and tracking. An extension of these methodologies to the medical imaging domain for the purpose of MR images segmentation has shown encouraging results [8,9].