Community Research and Development Information Service - CORDIS

Periodic Report Summary 1 - DIGITAL DUETS (DIGITAL Devices for mUltimodal Expression Through vocal Synthesis)

Description of the Project Objectives
DIGITAL_DUETS has two main objectives. The first one is the development of a novel articulatory speech synthesizer, for the exploration of new paradigms in synthetic voice production. The aim is to develop a device so expressive to be used as a fully functional Digital Musical Instrument (DMI). The second objective is the fostering of new composition and performance techniques, leveraging the developed synthesizer and its combination with interactive-multimodal technologies. Aside from these two main objectives, the project targets medical technology advancement and a related socio-cultural impact.

Work Performed Since the Beginning of the Project
The main bulk of the work performed during the first two years of outgoing phase has been the design and development of a novel 2D acoustic model for the targeted speech synthesizer. Based on a Finite Discreet Time Domain (FDTD) schema, the model makes use of the parallelism available in modern commodity GPUs to solve in real-time and quasi real-time acoustic wave equations (partial differential equation systems). The core of the model features an extremely innovative implementation, based on a set of shaders directly running on the GPU, managed by a C++ infrastructure. As a result, the system is able to simulate the propagation of pressure waves in 2D within arbitrary domains, both sonically and visually, and much faster than any other system available today. This part of the work was done in collaboration with Microsoft Research.
As described in the first project objective, the acoustic model needs to be controlled in real-time to be part of a fully functional articulatory speech synthesis algorithm. To reach this goal, the second part of the implementation of the model consisted of the modification of some parts of the infrastructure code to expose most of the included synthesis parameters and transform the model in an interactive application (as opposed to a mere simulation environment). Control parameters include the synthesis of the excitation waveform as well as the shape of the simulated domain. This process has been quite challenging; the two parts of the model (the main core and the infrastructure) continuously exchange control data, but are bond to threads that run on separate architectures, thus needing different optimization strategies. The new structure and routines were specifically designed to be compatible with generic control interfaces, whose development was scheduled for the second year of the project.
Once combined with highly optimized control routines, the 2D acoustic model was further modified, to be turned from a generic pressure propagation model into a precise simulation of the aero-acoustics of the human upper vocal tract. This process was composed of two main steps, based on the analysis of real subjects’ data, like Magnetic Resonance Imaging (MRI) and acoustic measurements. The first step consisted of modelling the
boundaries of the system, i.e., the coupling with the vocal folds and the behavior of the walls inside the vocal tract. The second step aimed at matching the aero-acoustics of complex 3D geometries, like the ones of real vocal tracts, by simulating propagation in two dimensions only. To do so, two different approaches were sequentially explored, one based on the deformation of real vocal tract geometries, the other capitalizing on a more sophisticated augmentation of the used 2D equations, to partially take into account the effect of transversal wave in the third dimension.
In parallel with the modification of the acoustic model, we have been carrying out the design and the development of novel gestural control devices to interface with our system. Our aim was to build DMIs that allow for direct manipulation of the visual representation of the simulation, one of the most innovative features of the system. This choice also permits to segue into the exploration of multimodal mappings, scheduled for the last year. The first device prototype consisted of an interactive pen display, whose stylus can be used to draw and modify the boundaries of the domain directly on screen. A second, more sophisticated solution leveraged massive multi-touch technology. We built a custom 42’’ transparent projection-screen with a capacitive film attached on one side; doing so, one or more performers can interact directly with the visual representation of the system, still allowing the audience to clearly see the interaction as well as the rendering, while standing on either side of the glass.
The flexibility of the developed technology allowed us to explore an interesting side project too. We built a completely novel DMI, called the Hyper Drumhead, which combines a modified version of the 2D acoustic model with an interactive graphical user interface, to create virtual percussive instruments. Thanks to the massively parametric nature of the used synthesis algorithm, the Hyper Drumhead can simulate realistic instruments and materials, but can also produce sounds that go beyond the law of physics.

Main Results Achieved So Far
The 2D acoustic model designed and developed during these two years of work is one of the most relevant achievements of the project. Together with the custom control routines integrated in the system, it forms the basis for the targeted articulatory speech algorithm. A second important achievement is the development of DMIs that capitalize on the same algorithm, as described in the first project objective. The project team worked with different versions of the acoustic model and combined them with two interfaces and control metaphors. In line with the second objective of the project, these instruments are still in development; however, they already allowed to explore novel paths in musical expression.
Dissemination has been a quite central theme of these two years too. The vocal tract acoustic model was presented at academic conferences, its description were published in a journal article and its details were shared online through a tutorial, which includes the full source code. Furthermore, myself and other team members (including supervised students) have been invited to give talks about the project and perform with the developed technology. We believe the interest and the support showed by the several communities we came across is a great achievement of DIGITAL_DUETS.
Finally, I underwent a thorough training, covering research, teaching, management and artistic skills. This prepared me to progress to the next step of my career, as described in one of the central points of the original proposal.

Expected Final Results and their Potential Impact and Use
The exploration of novel ways of composing and performing music is one of the main results of the project. From this perspective, the contribution is twofold: the designed instruments allow for a revolutionary way to multi-modally control the simulation of wave propagation, supporting new music and expression; furthermore, the underlying technology can be used to implement other types of physical model synthesis, fostering the design of a new series of instruments within the DMI community.
Thanks to the strong coupling between the audio and the visual rendering of the real-time simulations, the system becomes a neat tool to better understand the theory of air and sound propagation. This proved very useful already, during teaching and dissemination activities.
However, most of the potential of the system finds its application in the medical domain. The possibility to quickly synthesize the utterances of a specific patient using MRI scans is of great appeal to the partners we are working with, to safely diagnose speech/swallowing/breathe disorders, safely plan surgery and foresee its outcome.

Although a project website was not planned in the original proposal, I informally keep track of DIGITAL_DUETS updates on my personal website, alongside my past projects.
Link: http://toomuchidle.com/projects/digital_duets-digital-devices-for-multimodal-expression-through-vocal-synthesis/

Contact

Simona Ventriglia, (Administrative Assistant)
Tel.: +0039 01071781 285
E-mail

Subjects

Life Sciences
Record Number: 199420 / Last updated on: 2017-06-21
Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top