Skip to main content
Przejdź do strony domowej Komisji Europejskiej (odnośnik otworzy się w nowym oknie)
polski polski
CORDIS - Wyniki badań wspieranych przez UE
CORDIS

Raising co-creativity in cyber-human musicianship

Periodic Reporting for period 3 - REACH (Raising co-creativity in cyber-human musicianship)

Okres sprawozdawczy: 2024-01-01 do 2025-06-30

Digital cultures are increasingly pushing forward a deep interweaving between human creativity and autonomous computation capabilities of surrounding environments, modeling joint human-machine action into new forms of shared reality. Co-creativity between humans and machines will bring about the emergence of distributed information structures, creating new performative situations with mixed artificial and human agents, significantly impacting human development. To this end the REACH project aims at understanding, modeling, and developing musical co-creativity between humans and machines through improvised interactions, allowing musicians to develop their skills and expand their individual and social creative potential. Indeed, improvisation is at the very heart of all human interactions, and music is a fertile ground for developing models and tools of creativity that can be generalized to human social activity. REACH studies shared musicianship occurring at the intersection of the physical, human and digital spheres as an archetype of distributed intelligence, and produces models and tools as vehicles to better understand and foster music creativity that is more and more intertwined with computation. REACH is based on the hypothesis that co-creativity in cyber-human systems results from a an emergence of coherent behaviors and structure formation resulting from cross-learning and information transfer between agents as it is inherent to complex systems. It crosses approaches from AI, musical informatics, cognitive and social sciences, and mixed reality.
We made a number of advances since the beginning of REACH in setting up a machine learning theoretical and practical framework that serves our general objective of combining generative models and interaction, effectively feeding our“Deep Structure Discovery” project package.
This allowed us to present novel deep-learning models, based on the well-known “Transformer” technology from Google Brain and the “Contrastive Learning” idea that lets a system learn in a self-supervised way. We could then combine several modalities of text, audio and music in the learned representation, so that it was possible to generate high quality music samples based on text description from the users, thanks to a new diffusion model, MusicLDM. These techniques have been put to the test on stage in public concerts e.g. during the music festival Improtech (improtech.ircam.fr).
AT this stage of the REACH project, we try to make a coherent sense of the different stages of listening, training, generation and interactive experience . We have proposed, following Pr. Shlomo Dubnov’s anterior work on Music Information Dynamics (MID), a novel DMID framework (Deep Music Information Dynamics), that combines quality of latent representation, as learned by deep AI frameworks on one side, with accurate prediction of changes of musical information distribution over time on the other side, into an unified theoretical framework that explains mathematically the information transferts between agents (Symmetric Transfer Entropy).

On the engineering side, we have had a sustained development activity aiming at upgrading co-creative tools and creating new ones. The REACH software eco-systems now comprises of a family of concrete computational environments which address different aspects of the musical (improvisatory) mind : Djazz includes scenario and beat structure, Somax2 is a reactive program that adapts continuously to the musician’s changes, DYCI2 is a combination of reactivity and micro-scenarios. A visual component is also being developed in order to extend the co-creative capacities to image creation and animation in a synchronized way. As an example, one of our flagship, Somax2 is a highly original tool structured around 5 mains “skills” : a latent space built once and for all by machine learning algorithms trained on a large musical data set, encoding the general harmonic and textural knowledge ; a real-time machine listening device able to segment, analyse and encode musical streams in discrete components matched against the latent space ; a discrete sequential learning model able to figure out the pattern organisation of musical streams and form a state-based memory structure; a cognitive memory model able to temporally evolve continuous enveloppes over the sequential state structure, representing the activation rate (hot spots) w/regards to ever shifting internal and external influences and viewpoints ; a set of interaction policies determining how and when to react to influences.

Finally, on the social science side, experiments were carried on the notion of acceptability applied to musical avatars computed by AI. In particular, we built an avatar of great belgian harmonicist Toots Thielemans by extracting his solos from the record "Affinity" with Bill Evans trio (1979) using our deep zero-shot extraction method, and submitted them as training data to Djazz who was then asked to regenerate new improvisations, that were then mixed back with the accompaniment. The avatar was presented at the Bruxelles Royal Library on the occasion of Toots' Centenary and drew great attention among the experts. Other important doctoral studies have been launched in order to examine how co-creative software extensions can be used over popular social networks (s.a. TikTok) to communicate with other musicians, recruit them in shared experiences, then form new communities.

Overall REACH eco-system has already produced a body of theoretical knowledge in deep structure discovery, a collection of practical co-creative tools already used in large real-life artistic applications, and a human science research environment fostering anthropologic, cognitive and social advances.
REACH achievements have brought novel methodologies, successfully recognized by the research community and have pushed beyond the state of the arts.
For example, our HTS-AT hierarchical audio transformer model to produce the combined latent representation of music and general audio content is remarked as one of the state-of-the-art audio classification models in more than three benchmarks, including AudioSet, ESC-50, and SpeechCommand V2. It has been widely used in many following works, including multi-modality learning, sound event detection, audio source separation, etc. CLAP contrastive language-audio pretraining model received a lot of attention from the music and audio community because of its high performance on text-to-audio retrieval tasks and high generalization ability on different downstreaming tasks. Our Zero-shot audio source separation model ignites a new topic from the conditional audio source separation task. It achieves a competitive separation performance to the state-of-the-arts on the music source separation task but it is trained completely without needing the audio source data. Our transfer entropy method SymTE gives a quantitative score on appropriateness of musical co-improvisation, and is a first step to solve the improvisation influence problem (how an improvising agent’s signal is computer from another, co-improviser agent’s signal) . Such decisions are fundamentally important in improvisation settings, where musicians are trading the precision of the momentary musical sound with the flow of musical form and the co-creation of musical discourse, and will bring considerable improvements in REACH co-creative eco-system.
poster-noir-avec-logos-small.jpg
Moja broszura 0 0