Periodic Reporting for period 4 - Con Espressione (Getting at the Heart of Things: Towards Expressivity-aware Computer Systems in Music)
Berichtszeitraum: 2020-07-01 bis 2021-12-31
The project is about developing machines that are aware of certain dimensions of expressivity, specifically in the domain of (classical) music, where expressivity is both essential and - at least as far as it relates to the act of performance - can be traced back to well-defined and measurable parametric dimensions (such as timing, dynamics, articulation). The project focuses on developing computer systems that can recognise and characterise music by expressive aspects, generate and react to expressive qualities in music. To do so, we need to (1) bring together the fields of AI, Machine Learning, Music Information Retrieval (MIR), and Music Performance Research; (2) integrate theories from musicology to build more well-founded models of music understanding; (3) support model learning and validation with massive musical corpora of a size and quality unprecedented in computational music research.
The resulting computer technologies include computational models of expressive piano performance (autonomous and interactive); deep neural networks that recognise intended emotions and expressive character in music recordings; systems that successfully track expressive performances in real time; and a multitude of computer models of musical structure perception - all of which will be useful for a wide variety of purposes, such as more refined music search and recommendation systems, or new musically 'sensitive' computer systems for interactive music making. A specific demonstrator we targeted from the start and which in the end was successfully developed and also presented to a wide audience, is the "ACCompanion": a computer that plays together with a human pianist in a musically natural and expressive way, recognising and anticipating the pianist's expressive intentions, and adapting its playing style so as to match the expressive quality of the music, making for a natural musical interaction and experience.
A second line of research focused on the characterisation of expressive qualities in music and, specifically, in expressive performances. Distinct semantic dimensions of "expressive character" were identified, and machine learning models were developed that can recognise such expressive qualities, but also more basic emotional categories, from music recordings. A special aspect is that these models can explain their decisions, using intuitively interpretable perceptual concepts, which gives additional insight.
A central line of research was concerned with computational models of expressive performance: computer programs that learn to associate expressive playing patterns (relating to tempo, timing, dynamics, articulation) in human performances to patterns found in the score (the sheet music) of the piece. These programs learn to predict how a given musical passage should most likely be played expressively. A central result is the "Basis Function Model", a comprehensive formal model of expressive performance, based on latest methods from machine learning (deep neural networks). One version of this model was reported to have passed a "Musical Turing test" [E. Schubert et al., J.New.Mus.Res. 46(2), 2017], producing a piano performance that was judged, by a large listening panel, to be at least as "human" as the performance of a professional concert pianist. For a popular presentation of this, see https://www.sciencesquared.eu/why-is-music#why-music-so-expressive-computers-want-know .
Another line of research focused on interactive aspects in performance, specifically, algorithms that can reliable track and synchronise with live performances, and strategies for combining live tracking and synchronisation with real-time expressive playing and adaptation, which eventually formed the basis for our "ACCompanion", an interactive, "co-expressive" musical accompaniment system that accompanies a human pianist, adapting to the human's expressive playing, and combining this with its own expressive performance deicions. This had been envisioned in the original project proposal as a final demonstrator, bringing together the different lines of research.
Throughout the project, efforts were made to publicise and disseminate the research to various audiences, such as (to name just three) our curating of a pavilion as part of a big public science festival in the heart of Vienna (with more than 30.000 visitors); the Con Espressione! Exhibit - an interactive didactic installation for the exhibition "The Mathematics of Music" in Heidelberg, Germany (2019-2021); and the Falling Walls Science Summit 2021 in Berlin, where we were named the "Science Breakthrough of the Year 2021, Category Art & Science" and staged a live presentation involving the world premiere of our ACCompanion (see above); this presentation was live-streamed to a world-wide audience and is now openly available via the Falling Walls YouTube channel: https://www.youtube.com/watch?v=KE6WhYxuWLk