Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Virtual Reality Multimedia Analytics: An interactive approach to large-scale multimedia analysis in a virtual environment

Periodic Reporting for period 1 - ViRMA (Virtual Reality Multimedia Analytics: An interactive approach to large-scale multimedia analysis in a virtual environment)

Reporting period: 2020-09-01 to 2022-08-31

While the popular press often depicts a future where white-collar jobs have been taken over by artificial intelligence, it is more likely that many jobs (e.g. in hospitals, government administration, science, production) will require non-IT specialists to analyse, in real-time, huge volumes of complex data with the help of a computer. Media files make up a large portion of these data volumes, and already today our personal and professional lives are full of large digital media collections that must be interactively explored, analysed, understood, annotated, and used. The ViRMA (Virtual Reality Multimedia Analytics) project seeks to combine the state-of-the-art in scalable multimedia analytics, with the highly interactive access mechanism of virtual reality, to enable the development of a pioneering application to support innovative analysis of large-scale multimedia datasets. This research sought to extend the state of the art in the multimedia domain by primarily addressing a confluence of three major trends in computing: interactive analysis, novel access mechanisms, and collection scale. The overall objectives of the ViRMA project were to incorporate elements of these major trends and develop a pioneering, scalable, and interactive, multimedia analytics system designed for virtual reality.
The primary research work carried out during ViRMA’s project life cycle was the development of a virtual reality application prototype which was evaluated in three phases. The front-end application layer was developed in the Unity game engine and was served by a standalone REST API developed in C# which utilised the Multi-dimensional Media Model (M3) model as its database paradigm. This backend server and database model were first implemented as part of the PhotoCube prototype, a master thesis project, which sought to enable users to browse and explore their personal media collections using their internet browser. To adequately serve ViRMA, the PhotoCube back-end server needed to be significantly enhanced and, due to project time constraints, needed to developed in parallel with ViRMA’s frontend application layer.

The three phases of evaluation were initially planned to target three real-time information retrieval competitions held at various international conferences, namely the Lifelog Search Challenge (LSC) and the Video Browser Showdown (VBS). This would have meant targeting the LSC at ACM ICMR 2021, the VBS at MMM 2021, then the LSC again at ACM ICMR 2022. However, due to the coronavirus pandemic and the rescheduling of conferences, the third and final evaluation was re-focused as a small-scale user study. The targeting of these real-time information retrieval competitions was to compare and contrast our approach to other state-of-the-art large-scale information retrieval methodologies. The primary results from participating in the LSC and VBS real-time competitions was that ViRMA has great potential as a first-generation information retrieval prototype, but it is currently biased toward exploration over search. In practice, this means that ViRMA is especially useful at helping an individual learn about what is inside a large multimedia dataset, but not very helpful when it comes to finding a specific item in that same dataset. This is problematic for the LSC and VBS competitions as they are heavily geared toward known-item search tasks as they have a consistent goal and are easier to quantify. This meant that ViRMA did not perform well in either competition but provided valuable insight into its current strengths and weaknesses.

The third and final evaluation which intended to analyse ViRMA’s exploration and browsing capabilities became an in-depth user study with the lifelogger responsible for generating the dataset for the LSC competition. A recurring challenge for the LSC lifelogger is browsing their own dataset to locate viable topics which can be used for the LSC tasks each year. Locating such topics is an exclusively exploration-oriented use case for which ViRMA is well-suited and became a natural fit for the final evaluation. The study took place in the form of two sessions, each lasting approximately two hours, where the lifelogger had unrestricted access to ViRMA. The primary goals of the study were to evaluate how successfully the lifelogger could locate as many useful topics for future LSC competitions as possible. The results of the evaluation indicated a strong foundation for ViRMA’s exploration use case, where the lifelogger was able to generate a pool of 47 potential topics for future LSC competitions using the same dataset.

The results from these three evaluations of the ViRMA prototype were disseminated via conference proceedings where a physical demonstration of the virtual reality software was also presented to all who were in attendance. This included online participants who witnessed the demonstrations via video due to hybrid conference attendance amid the coronavirus pandemic. A fourth and final publication describing the entirely of the ViRMA project will be submitted to an academic journal on multimedia after the end of the project in September 2022.
ViRMA has advanced the state-of-the-art by proposing an alternate approach within the domain in the form of a highly interactive exploration-oriented multimedia analytics system for virtual reality. Though the initial goal was to develop a prototype which simultaneously addressed both search and exploration use cases, the first two prototype evaluations determined that the most recent iteration of the prototype was ill-suited to search in its current state and would require future work. ViRMA’s progress beyond the state-of-the-art within the multimedia domain involved utilising the Multi-dimensional Media Model (M3) which enables users to visualise and explore media collections in 3D space using tags. The socio-economic impact and wider societal implications on ViRMA’s contribution to the multimedia domain have great potential. One need only consider the incredible volume of multimedia datasets we as a society generate today, from our personal data collections to websites like YouTube or the vast archive of CCTV footage stored by private entities. One can easily imagine the benefit of an intuitive multimedia browsing platform which might enable us to better explore these collections for various means. Consider a detective investigating a large collection of video footage for evidence, or a doctor analysing a large collection of x-rays to determine the cause of a medical condition. Though virtual reality platforms have yet to establish themselves as pervasively as other technologies like smartphones, they are continuously improving, becoming more convenient, intuitive, and widespread. Though the current iteration of the ViRMA system might exist on hardware that some might consider cumbersome, one need not look very far in the future to observe a scenario where this is not the case, and the intuitive and immersive benefit of interacting with 3D data visualisations in virtual space is undisputed.
Searching and navigating metadata information in ViRMA
Exploring and interacting with filtered media objects in ViRMA
3D visualisation of a multi-dimensional metadata space in ViRMA