Skip to main content

DIGITAL Devices for mUltimodal Expression Through vocal Synthesis

Final Report Summary - DIGITAL DUETS (DIGITAL Devices for mUltimodal Expression Through vocal Synthesis)

DIGITAL_DUETS is a multi-disciplinary project that spans diverse fields, including acoustics, physiology and music. Its aim is the design of a new technology capable of the precise simulation of the physics underlaying airwave propagation and voice production, and so flexible to be employed both as a simulative tool and as a means for musical expression. This is directly connected to the two final objectives of the project. The first one is the design of an articulatory speech synthesizer allowing for the fast acoustic modeling of patient-specific vocal tract’s geometries; by capitalizing on an innovative physical modeling implementation, this device would outperform any other similar system, as capable of simulating and control the nuances of the vocal spectrum in a real-time or quasi real-time scenario. The second objective concerns the exploration of the same technology in an artistic context, and it consists of the definition of new paradigms in music composition and performance. To do so, the designed physical models would be turned into the core of an innovative Digital Musical Instrument (DMI), supporting the audio/visual manipulation of sound propagation in real-time.

In the first two years of the project, I worked abreast Prof. Sidney Fels, leader of the Human Communication Technologies lab, University of British Columbia (Vancouver, Canada). There I collaborated with a team highly specialized in the bio-mechanical and acoustic modeling of the human upper vocal tract, and led the design and the implementation of the innovative DIGITAL_DUETS physical models. This process capitalized on the development of a groundbreaking technology, that leverages the parallel computational power of commodity graphics cards to boost the solution of complex mathematical systems. The implementation of all the physical models conceived during the project relies on this novel approach, and allowed for the precise and fast simulation of a variety of real acoustic systems, encompassing excitation, resonance and radiation phenomena. The core of this set of models is a series of Finite Difference Time Domain (FDTD) solvers that simulate in real-time airwave propagation in two dimensions; each of these can be coupled with any of the other models, which instead simulate acoustic excitation phenomena, and can be divided in two groups: musical instruments’ excitations (i.e. reeds, labium, buzzing lips) and a set of models of the human glottis with increasing complexity. The FDTD flow models have two more unique features. The first one is the fact that the simulation is both visual and sonic; the traveling waves are rendered on screen and in any location of the simulated space they can be sampled and turned into an audio stream in real-time. The second feature concerns the high level of interaction that the system grants; the acoustic parameters of the simulated space as well as its geometrical boundaries can be modified while the models are running, especially thank to the graphical feedback. As a result, the models allow to see and listen to how sound propagates through an interactive space. Figure 1 depicts, on the right, how the system visualizes the propagation of a sinusoidal excitation in free space and, on the left, a gaussian impulse interacting with a boundary wall. Figure 2 and Figure 3 are animations showing the real-time modification of the boundaries in open space and in an irregular tube, respectively.
When a flow model is coupled with a glottal excitation model, the systems turns into a speech synthesizer. The synthetic glottis can be used to produce oscillating waves that travel through a tube-like geometry modeling the shape of a human vocal tract, ending with a radiating mouth opening. The acoustic parameters of the walls can be tuned to simulate the absorption coefficients of real tissues and its shape can be modified in real-time, to mimic articulation. The most sophisticated flow and glottal models that can be combined together to form the DIGITAL_DUETS speech synthesizer are the 2.5D FDTD flow model and the Vocal Fold Continuum model. The former is an improved version of an implicit 2D FDTD solver, capable of simulating airwave propagation in 3D tubes that are symmetric along at least one axis; the latter is a precise simulation of the glottis and its vibrations, based on the solution of the 2D fluid-solid interaction between a deformable geometric model of the vocal chords and the lung pressure that makes it oscillate.
The system can be also interfaced with a bio-mechanical model designed during the project by Prof. Fel’s team. This component simulates the hard and soft tissues of the human upper vocal tract, and can register the rendered geometries to the anatomy of real subjects, using as reference Magnetic Resonance Imaging data. When coupled with the bio-mechanical model, the speech synthesizer can approximate the 3D geometries within the chosen FDTD flow model, and synthesize subject-specific speech via a glottal excitation. This technology has the potential to revolutionize oro-pharyngeal treatment planning, since enabling the immediate prediction of the effects that surgical operations and radiotherapy for cancer may have on patient-specific speech quality. The interfacing between the synthesizer and the bio-mechanical model is showed in Figure 4, while Figure 5 depicts the full subject-specific speech synthesis process.
In the third year of DIGITAL_DUETS, I joined Prof. Caldwell’s Department of Advanced Robotics at the Istituto Italiano di Tecnologia (Genoa, Italy), to tackle the second objective of the project, and leverage the flexibility of the designed technology to create an innovative DMI. Supported by a local team of researchers skilled in interaction and physical computing, I designed a control interface tailored to the features of the DIGITAL_DUETS audio/visual physical models; this consists of a custom 42’’ transparent multitouch display, where to visualize and interact with the waves produced by the diverse excitation models. Figure 6 is shot of the interface at rest, while Figure 7 shows the rendering of propagating waves.
To turn this apparatus into an expressive musical instrument, I collaborated with 7 semi-professional musicians who took part in a series of co-design sessions; the aim was to define a series of audio/visual interaction paradigms capable of engaging with the psychophysical skills of composers and performers. The result is an award winning instrument, called Hyper Drumhead, that allows for idiomatic playing techniques inspired by augmented/hyper instruments, but leveraging the tremendous flexibility of physical modeling synthesis. A video presentation of the instrument can be found at the following link: paper describing its groundbreaking design won the Best Paper Award at the 43rd International Computer Music Conference (Shanghai, China), which is one of the most important music technology conferences; furthermore, it gained the first prize at the Guthman Musical Instrument Design competition 2018 (Atalanta, US), obtaining the title of best instrument of the year. This is the most prestigious award in the field, gathering an international array of the best designers, and the most innovative instruments, from both the academia and the industry.
Moreover, the Hyper Drumhead strongly impacted electronic music making and musical performance practice. A group of local musicians has already started composing with the instrument, exploring the unique possibilities unlocked by the real-time manipulation of sound wave propagation. The first pieces composed with this technology were presented in a one-day festival I co-organized in Genoa with the cultural association Disorder Drama; during this event, I introduced the instrument as well as the full scope of DIGITAL_DUETS in a public talk, followed by several demo sessions in which the audience had the chance to try some of the innovative physical modeling and interactive technologies designed over the last 3 years; the event was then concluded by the performance of an international headliner.
The Hyper Drumhead and its underlaying technology are both an unprecedented musical device and a research platform. Its potential is attracting composers, instrumentalists as well as scientists, interested in further advancing digital luthiery and, above all, in the creation of new music.