CORDIS - Forschungsergebnisse der EU
CORDIS

Getting at the Heart of Things: Towards Expressivity-aware Computer Systems in Music

Periodic Reporting for period 4 - Con Espressione (Getting at the Heart of Things: Towards Expressivity-aware Computer Systems in Music)

Berichtszeitraum: 2020-07-01 bis 2021-12-31

What makes music so important, what can make a musical performance or concert so special and stirring? It is the things the music expresses, the emotions it induces, the associations it evokes, the drama and characters it portrays. The sources of this expressivity are manifold: the music itself, its structure, orchestration, personal associations, social settings, but also - and very importantly - the act of performance, the interpretation and expressive intentions made explicit by musicians through nuances in timing, dynamics etc. Thanks to research in fields like Music Information Research (MIR), computers can do many useful things with music, from beat and rhythm detection to song identification and tracking. However, they are still far from grasping the essence of music: they cannot tell whether a performance expresses playfulness or ennui, solemnity or gaiety, determination or uncertainty; they cannot produce music with a desired expressive quality; they cannot interact with human musicians in a truly musical way, recognising and responding to the expressive intentions implied in their playing.

The project is about developing machines that are aware of certain dimensions of expressivity, specifically in the domain of (classical) music, where expressivity is both essential and - at least as far as it relates to the act of performance - can be traced back to well-defined and measurable parametric dimensions (such as timing, dynamics, articulation). The project focuses on developing computer systems that can recognise and characterise music by expressive aspects, generate and react to expressive qualities in music. To do so, we need to (1) bring together the fields of AI, Machine Learning, Music Information Retrieval (MIR), and Music Performance Research; (2) integrate theories from musicology to build more well-founded models of music understanding; (3) support model learning and validation with massive musical corpora of a size and quality unprecedented in computational music research.

The resulting computer technologies include computational models of expressive piano performance (autonomous and interactive); deep neural networks that recognise intended emotions and expressive character in music recordings; systems that successfully track expressive performances in real time; and a multitude of computer models of musical structure perception - all of which will be useful for a wide variety of purposes, such as more refined music search and recommendation systems, or new musically 'sensitive' computer systems for interactive music making. A specific demonstrator we targeted from the start and which in the end was successfully developed and also presented to a wide audience, is the "ACCompanion": a computer that plays together with a human pianist in a musically natural and expressive way, recognising and anticipating the pianist's expressive intentions, and adapting its playing style so as to match the expressive quality of the music, making for a natural musical interaction and experience.
To lay the foundations, we performed substantial research on computational models of structure recognition (e.g. rhythm, harmony, recurring themes) and on listening models (e.g. perception of harmonic tension). Some of the resulting algorithms were shown to be the best world-wide (by achieving top results in international scientific competitions); also, some have been demonstrated experimentally to improve computer models of expressive music performance.

A second line of research focused on the characterisation of expressive qualities in music and, specifically, in expressive performances. Distinct semantic dimensions of "expressive character" were identified, and machine learning models were developed that can recognise such expressive qualities, but also more basic emotional categories, from music recordings. A special aspect is that these models can explain their decisions, using intuitively interpretable perceptual concepts, which gives additional insight.

A central line of research was concerned with computational models of expressive performance: computer programs that learn to associate expressive playing patterns (relating to tempo, timing, dynamics, articulation) in human performances to patterns found in the score (the sheet music) of the piece. These programs learn to predict how a given musical passage should most likely be played expressively. A central result is the "Basis Function Model", a comprehensive formal model of expressive performance, based on latest methods from machine learning (deep neural networks). One version of this model was reported to have passed a "Musical Turing test" [E. Schubert et al., J.New.Mus.Res. 46(2), 2017], producing a piano performance that was judged, by a large listening panel, to be at least as "human" as the performance of a professional concert pianist. For a popular presentation of this, see https://www.sciencesquared.eu/why-is-music#why-music-so-expressive-computers-want-know .

Another line of research focused on interactive aspects in performance, specifically, algorithms that can reliable track and synchronise with live performances, and strategies for combining live tracking and synchronisation with real-time expressive playing and adaptation, which eventually formed the basis for our "ACCompanion", an interactive, "co-expressive" musical accompaniment system that accompanies a human pianist, adapting to the human's expressive playing, and combining this with its own expressive performance deicions. This had been envisioned in the original project proposal as a final demonstrator, bringing together the different lines of research.

Throughout the project, efforts were made to publicise and disseminate the research to various audiences, such as (to name just three) our curating of a pavilion as part of a big public science festival in the heart of Vienna (with more than 30.000 visitors); the Con Espressione! Exhibit - an interactive didactic installation for the exhibition "The Mathematics of Music" in Heidelberg, Germany (2019-2021); and the Falling Walls Science Summit 2021 in Berlin, where we were named the "Science Breakthrough of the Year 2021, Category Art & Science" and staged a live presentation involving the world premiere of our ACCompanion (see above); this presentation was live-streamed to a world-wide audience and is now openly available via the Falling Walls YouTube channel: https://www.youtube.com/watch?v=KE6WhYxuWLk
Progress beyond the state of the art was made along all research directions mentioned above, as evidenced by a multitude of scientific publications - from structure perception (where we won various international scientific challenges), to emotion recognition (where we won first prize in a pertinent emotion recognition challenge), to expressive music performance (where our Basis Function Model successfully passed a musical "Turing Test"). We have shown how to improve expressive performance models by integrating models of musical listening (e.g. perception of musical tension/relaxation, and the formation of expectations in the listener). We developed the first computer program that can track expressive performances (audio) and follow along in the score, in real time, on plain sheet music. And finally, we presented the first interactive accompaniment system that reacts to expressive playing style and combines this with its own interpretation strategies.
Man-Machine Collaboration in Expressive Performance: The Con Espressione Exhibit
Autonomous Expressive Accompaniment: The ACCompanion