From the outset of the project the focus was to create a platform to address the real world media workflow challenges, so the first task was to engage with the industry and find out what was needed. A group of key industry users was from the existing customers of the companies in the consortium was interviewed and the results were collated and a clear set of functional requirements emerged:
● Automatic Speech Recognition (ASR)
● Optical Character Recognition (OCR)
● Face detection & recognition
● Logo detection & recognition
Crucially, businesses were attracted to the benefits of using machine learning to help address their workflow challenges, but were hesitant to use cloud services in view of potential challenges of processing large volumes of media content, which is typically stored on-premises, without creating additional workflow challenges or costs involved with moving large amounts of data to the public cloud, with the associated confidentiality and security risks. Additionally, the public cloud services are available to everyone and, therefore, difficult to train for specific scenarios.
To account for these challenges the ReCAP solution needed to be an easy to train software application with built-in processing capability, that could be deployed wherever needed, as well as the ability to consume public cloud machine learning services to achieve specific tasks.
ToolsOnAir developed the Media Processing Service (MPS), from an existing workflow software application to provides a business logic layer of abstraction from the underlying algorithm technologies and by leveraging the GStreamer open source multimedia framework media analysis pipelines could be assembled and run via an application programming interface (API).
The underlying algorithm technologies were developed and wrapped as GStreamer plugins by JOANNEUM RESEARCH and nablet. This work included the assessment and selection of different core machine learning open source projects, with ongoing improvements to the quality of the analysis. Algorithms were also benchmarked to ensure performance and accuracy against comparable machine learning services.
In order to demonstrate the software functionality, NMR designed an intuitive, web browser based Graphical User Interface (GUI), to control the MPS and to provide a frontend for users to quickly ingest and analyse media content and view the results. The metadata generated from the analysis are stored in industry standard formats which, along with the API, means that ReCAP is easy to integrate into other systems and existing workflows.
ReCAP was demonstrated, at various stages of development, at industry trade shows to gather user feedback and generate interest in the project and its results. The first public demonstration was at NAB 2017 (National Association of Broadcasters, April 2017, Las Vegas) and subsequent events - the EBU Metadata Developer Workshop (June 2017, Geneva), IBC 2017 (International Broadcast Convention, September 2017, Amsterdam), presentations at BVE 2018 (Broadcast Video Expo, February 2018, London) and NAB 2018 (April 2018, Las Vegas).
NMR also engaged with an European Commission Investment Expert Group initiative to assess the investment potential of products emerging from the Horizon 2020 project. The group assessed the leadership team capabilities, product readiness, market readiness and the financial strategy of the project and concluded that ReCAP was “ready for investment.”