Automatic music transcription of polyphonic audio

Periodic Reporting for period 4 - DoReMIR (Automatic music transcription of polyphonic audio)

Reporting period: 2017-03-01 to 2017-09-30

The project goal was to develop a low-cost, cloud-based, polyphonic audio transcription solution based on an interdisciplinary approach and a user-driven design. In addition to finding better solutions to certain analysis problems, the resulting systems will also be able to communicate their results in musically meaningful, high-level terms.

The project’s objectives was been structured in 4 key areas that define clear, measurable key performance indicators or KPIs:
• Objective 1 (develop a cloud-based polyphonic music transcription solution based on new signal processing models): the project will develop an robust and accurate transcription engine able to process polyphonic music in the cloud and to provide the resulting data to any applications using its APIs.
• Objective 2 (develop new tools and applications supporting new pedagogical models for music schools): the project will implement a first application example (score transcription application for music composing and music teaching).
• Objective 3 (develop a vibrant ecosystem around the technology and models): the project will publish APIs and encourage third party developers to come up with their own innovative application ideas, based on the polyphonic music transcription engine.
• Objective 4 (successfully translate research results in scalable technology): the project will develop all the necessary foundations for a successful exploitation (product market launch) after the end of the project.

Conclusions
We consider the project in general to be extremely successful, in terms of research, technical development, product development, user interaction, user feedback, development of dissemination and exploitation strategy and for creating a platform for providing technology for 3rd parties.
Since the 1980s, the only way to successfully create musical scores or transform sounds with a polyphonic instrument on a computer has been by MIDI – connecting the computer via cable and interface to an electronic instrument, which has boosted the electronic instruments development tremendously. However, with the achievements made within the Poliphoni-X-score project, we foresee that it is now time to “cut the wires” and connect the world of acoustic instruments and sounds to digital representation of music, with is no less than a revolution with regards to music technology.
In general, we have accomplished to develop
• a unique, world-class AMT engine with a lot of potential for further development, based on polyphonic pitch recognition using the most modern machine learning technology (deep learning) in combination with new developments of our unique cognitive music analysis that extends the frontiers of music technology. The performance of the model is close to MIDI input with note recognition well above 90% for some instruments, which is an increase of 20-30% in relation to previous state-of-the-art models and beyond our expectations. Moreover, it is quite invariant to deficiencies in recorded sound quality, invariant to multitimbral input and provides excellent timing information, close to what can be expected from MIDI input.
• an industrialization solution for the AMT engine that makes it scalable and not just working in the lab, by using new server technology, such as AWS Lambda, and using computing effective solutions to be able to run the technology on CPUs and successfully used in mobile devices which have obvious limitations regarding e.g. recording quality, computing power and network connection reliability.
• an API system making it possible both for our own product development and opens the technology for 3rd party developers, thus providing a solid ground for a development of the industry.
• An cross-platform application platform based on HTML5 web technology that makes it possible to create a whole line of products for mobile devices, which puts this technology right in the front of current application development.
• Innovative applications/products based on interaction with users and music pedagogic institutions that answer to real user needs and solves musical problems related to music performance and learning and opens up new applications for music technology.
• Not to the least, we have three public applications that make use of the technology developed within the project launched or about to be launched.

This report covers the entire project Polyphoni-X-Score that started in May 2015 until Sep 2017.

As explained above, we have managed to complete the tasks of the project and achieved the desired KPIs for the Objectives outlined in the project application, as well as reaching the milestones described.
This has been made possible through a huge amount of work done by the project team within all work packages and good cooperation and support from project partners.

For Objective 1, we have managed to reach all KPIs and are proud to have both developed a world-class AMT engine in-house based on new innovative technology and our own unique music structure analysis system and
moreover, managed to industrialize the system integrating it in novel products currently available to the public. This has happened even though we needed to develop an alternative solution quite late in
the project.

For Objective 2, we have managed to reach the targets for the project, by launching two new applications as well as a polyphonic version of our flagship product, ScoreCloud.
These applications are to a great extent the result of the intense collaboration with music teachers and music schools achieved within the project, and reflects the results of
user tests and feedback from music students and music teachers.

For Objective 3, we have managed to reach the main targets for the project, by making a public API bank, called POND API, which constitutes the basis for the development of a vibrant ecosystem around the technology.
During the reporting period we have further developed the POND, and have had third party developers using and testing the APIs. These cooperations have been vital for developing POND and our technology in general.

For Objective 4, we can conclude that we have managed to find a technical solution and develop an exploitation plan that is commercially viable, making it possible to combine performance requirements with cost-efficiency.

It is notable that the goals set out for the project in terms of technical performance is above what was expected. Due to successful in-house development of polyphonic pitch recognition, development and adaptation of music analysis for polyphonic audio and industrialization of the technology we have managed to develop a system which is not only world class, in terms of performance, but it is also scalable and commercially valid.

As mentioned above, we have made progress beyond state-of-the art, e.g. in terms of the technology for polyphonic pitch detection and music transcription which
has results exceeding previous state-of-the-art results for note detection in polyphonic audio signals by 15-25 percent.

There is no as advanced and versatile music transcription system on the market for polyphonic music transcription as our system, which is why the reach beyond state-of-the-art
cannot be easily quantified.

Automatic Music Transcription - A dream becoming reality - Picture from 1899

Work Package Overview

Project Concept Overview

Periodic Reporting for period 4 - DoReMIR (Automatic music transcription of polyphonic audio)

Share this page

Download