The ITS A DIVE research methodology has developed in three phases: acquisition, modelling, and evaluation.
For what concerns the acquisition phase, a large number of public HRTF databases from worldwide research labs have been collected and fused in order to form a unique large set of acoustic measurements (>400 human subjects). In addition to the organization and use of these public datasets, most resources in this phase have been allocated to the collection of a new dataset of custom acoustic measurements, named the Viking HRTF dataset, in collaboration with the University of Iceland. This dataset includes full-sphere HRTFs measured on a dense spatial grid with a binaural mannequin with different artificial pinnae attached. Anthropometric data have been either collected from pre-processing of public databases or obtained with new measurements on 2D or 3D anatomical data (e.g. ear pictures, head meshes). In particular, new features related to the shape of the ear have been automatically extracted from 3D head/ear meshes, such as depth maps, edge maps – i.e. 2D representations of the most prominent pinna edges, and reflection maps – i.e. selections of mesh points that theoretically produce reflections towards the ear canal entrance.
The modelling phase has focused on a blend of traditional signal processing techniques, state-of-the-art machine learning algorithms tuned to both global and local characteristics of HRTFs, and physically inspired models of sound propagation within the ear. Each structural component has been analyzed through ad-hoc signal processing algorithms; this has been possible because some of the collected HRTF databases contain partial responses of head-only or earless mannequins. Then, since HRTFs are by design subject to high dimensionality issues due to the wide range of predictors, adequate dimensionality reduction and/or feature extraction techniques have been applied to partial HRTF data in order to obtain compact representations to be correlated to anthropometric data. Finally, the most adequate machine learning techniques, including state-of-the-art deep learning algorithms, have been applied to yield the model that better meets speed, interpretability, and accuracy requirements. This procedure has allowed the design of a complete structural HRTF model combining measured, synthesized, and selected components.
In the evaluation phase, signal-related error metrics and auditory models have been developed to compare the customized HRTFs obtained through the developed structural model against the original measured HRTFs of a number of database subjects. Indeed, a good objective correspondence between the two sets is the basis for performing subjective tests. The HRTF models have then been integrated in a 3D game in order to perform individual tests with dynamic rendering of virtual sound sources. Collected metrics from the user tests include among others localization error, degree of externalization, and an extensive user questionnaire. The final results showed that, overall, the participants performed best in the localization task with their individualized HRTF, with an increased accuracy with respect to generic HRTFs especially in the case of expert game players.
Exploitation of the action results and dissemination to the scientific community has occurred through the release of a new database of acoustical measurements as well as the source code for the structural HRTF model, and the publication of high-quality scientific papers in international peer-reviewed scientific journals and conferences. Non-commercial exploitation of both publications and code/databases, which are expected to become solid references for researchers in 3D audio and more in general in the European Sound and Music Computing (SMC) community, is expected to lead to further research in this field and further strengthening of the position of both the researcher and the host in these research communities.