Skip to main content
European Commission logo print header

Extended Subtractive Synthesis of Audio Signals

Final Report Summary - ESUS (Extended Subtractive Synthesis of Audio Signals)

Nowadays, Computer Music is definitely established in the contemporary musical world, whether it is for recording, mixing, composing or playing new (virtual) instruments. Moreover, digital techniques of sound analysis, synthesis, transformation and coding are now commonly used in many other applications: telecommunications, mobile technology, cinematography, video games or virtual reality for example. As an answer to this demand, a new digital audio synthesis technique is proposed: the generalization of the Subtractive Synthesis.

The basic idea of the Subtractive Synthesis is to imitate a given periodic sound by filtering a fixed excitation signal; it is based on the “source-filter principle”. Then, the fundamental frequency of the source is tuned to the desire pitch, and the filter is designed to reproduce the original timbre of the target sound. Paradoxically, since the 1960s this technique has become very popular not for its realism but for its ability to produce new, unheard sounds. The proposed “Extended Subtractive Synthesis” (ESUS) aims at satisfying the sound realism thanks to recent advances in the Audio Signal Processing field.

In parallel, Additive Synthesis is another popular synthesis method. Basically, it consists in adding the sinusoidal components of a Fourier series. Whereas it is well suited for most of musical sounds, it requires a important memory space and computational time. Another different approach is the Physical Modelling based synthesis which consists in simulating a part of the internal behaviour of a given instrument. Even if its realism can be outstanding, such a synthesizer is not universal, which means that every musical instrument has to be modelled individually.

The generalization of the Subtractive Synthesis is highly promising for the three given objectives:
- universality (restricted to quasi-periodic signals such as harmonic musical instrument tones),
- low-cost implementation (allowing real-time synthesis by most of cheap processors) and
- good perceptual quality and realism.
Note that harmonic and quasi-harmonic sounds represent the majority of all musical audio signals, as they contain tones produced by most orchestral instruments, such as bowed and plucked strings and wind instruments.

Nevertheless, even if the source-filter principle is well adapted for speech analysis-synthesis, it has no physical validation for the modelling of most of musical instruments. In consequence, to reach the three challenging objectives, rather than to rely on the physical production of the sound, the analysis is based on the knowledge of human hearing perception, similarly to most of the lossy data compressions (MP3 for instance). Basically, it consists in focusing the modelling accuracy on the highly perceptible sound components, and in relaxing it for imperceptible components. But the used principle is different from the standard compression methods, then new techniques has to be developed.

To respond to these requirements, three consecutive tools have been developed. Starting from the magnitude and the frequency location of harmonics, which describe the periodic signal, the first step provides an accurate spectral envelope estimation. The Spectral Envelope is a function which represents the global shape of a signal in the frequency domain, and it is one of the determining factor of the timbre of a sound, or in other words, its “color”. Contrarily to previously existing methods, this technique allows the use of the auditory masking together with a full control of the error modelling. Then, to remove imperceptible details of the estimated spectral envelope, the second step is a perceptual smoothing. It generates a significantly more regular envelope which is perceptually similar to the original. Third, the coefficients of the simulated filters are optimized using a perceptually based criterion. This criterion takes account of the frequency resolution of the ear, the perceptual loudness, and the auditory threshold. Remark that with the source-filter principle, the simulated filters aim to imitate the original spectral envelope, therefore this three-step procedure provides a filter identification based on a perceptual approximation of the original timbre of the sound.

For the sound synthesis of musical instruments, we derived a new simulation schema in accordance to the subtractive synthesis. With this structure, first a periodic source generator synthesises a modified sawtooth, as with the standard subtractive synthesis. Note that the noise part of the sound is separately simulated using the same principle, but the source signal is given by a random generator producing a white noise; this noise part corresponds for example to the breath for flutes, or the bow friction for violins. Second, the simulated filter is factorized into a filter chain composed of the following filters:
- Instrument filter: this stationary filter aims to reproduce the “representative” timbre of the whole instrument. It is a relatively high order filter providing a fine approximation.
- Tone filter: this stationary filter aims to reproduce the “representative” timbre of one tone.
- Modulation filter: this low-cost and time-varying filter reproduces the natural modulation of the timbre in time.
- Velocity filter: this low-cost filter takes account of the dependency of the timbre to the dynamics, or velocity.
This form of the filter allows to isolate some properties into different elements. For example, a stationary part, invariant in time, is isolated into the instrument and the tone filters. Because the modification of the filters is quite consuming, it uses a time interpolation of the coefficients, its factorization into low-cost filters makes possible an easier time modulation. Also the pitch independent part is isolated into the instrument filter, which is simulated at the end of the filter chain. Consequently, in the case of polyphonic synthesis, all tone contributions are summed and merged to the unique instrument filter, and the global cost of the synthesis does not increase because the most consuming element is simulated only once.

Starting from the recording of all isolated notes of a given instrument, over its whole tessitura and for some dynamics (e.g. pianissimo, mezzo-piano, forte), we developed a program for the automatic analysis. This off-line analysis relies: on a time-varying pitch estimation (for the natural vibrato for example), on a periodic+noise parts decomposition, and on the three-step procedure briefly presented above. Then, the output values of the method are the coefficients of the source generators and of the filters of the chain. Even if this off-line analysis is complex and time consuming, first the synthesis structure is easy to implement because it is only made of simple source generators and linear filters, stationary or time-varying. Second the digital sound synthesis is low-cost and therefore, can be easily computed in real-time, even with cheap processor units (CPU).

Because of the three achieved challenging objectives, the proposed method of this project has a significant impact in sound synthesis of musical tones: first this developed analysis-synthesis method allows the simulation of a wide number of instruments, in contrast to physical based synthesis; second, the cost of the real-time synthesis has been reduced thanks to the proposed filter chain; and third, the perceptual quality of the resulted sounds has been improved using the new three-step procedure. Even though the quality and computational efficiency have been simultaneously improved, it is still possible to make a compromise by adjusting the filter orders.

Moreover, the three proposed tools for spectral envelope estimation, perceptual smoothing and perceptual filter approximation can have a relevant impact for other audio applications, such as speech coding and synthesis, audio enhancement, digital audio effects and modifications; and other topics in signal processing or mathematics.

Figure legend:
Illustration of the three-step procedure. First the spectral envelope of the original sound spectrum (gray line) is estimated and smoothed (black line). Second, the perceptual filter approximation is done (red line). This method significantly focuses the approximation where the spectrum is perceptible and relaxes it when it is imperceptible.

Related documents