PhD proposal Aeroacoustic modelling of speech

approximation) and on the assumption that the source of sound and the acoustical resonator are ... propagating and radiating acoustical waves. ... In order to mimic in vitro the generation of consonants, a vocal tract mechanical replica will be.
87KB taille 5 téléchargements 314 vues
PhD proposal Aeroacoustic modelling of speech

At present time, the theoretical models used in speech synthesis as well as speech analysis (such as inverse filtering, for instance) rely on low-frequency acoustic propagation models (one dimensional approximation) and on the assumption that the source of sound and the acoustical resonator are independent. The one dimensional approximation can be justified, to a certain extent, in the case of voiced sounds due to the low-frequency behaviour of the glottal source and due to its position inside the vocal tract. This is not the case for plosives and fricatives for which one can expect the generation and the propagation of higher acoustical modes. These higher modes are then predominant not only inside a resonator but also have a spectacular effect on the radiated sound in terms of directivity. Based on anatomical considerations, one can estimate the first cut-on frequency of these higher acoustical modes to lie around 4-5 kHz, which is in the middle of a typical speech spectrum and close to the maximum of sensitivity of our ears. Perceptual effects of these higher acoustical modes can therefore be expected to be considerable. Further, during the production of a vowel-consonant sequence, the aerodynamical and the aeroacoustical coupling between the two different sources of sound is overlooked or poorly described in the literature. Lastly, in the case of fricatives, the sound source cannot be considered as localised. More elaborated theoretical models are obviously needed. The goal of this PhD is to study theoretically and experimentally these acoustical and coupling effects within the framework of aeroacoustics.

Theoretical aspects

From the theoretical point of view, this PhD will start with the modelling of three dimensional propagating and radiating acoustical waves. Due to the dimensions of the human vocal tract, a modal approach seems well adapted for this purpose. The solution of the wave equation is analytical in the case of a simple geometry and can be extended numerically to the case of more complex resonator shapes (closer to the human vocal tract) by a matching mode procedure. Modelling the aeroacoustical interactions between sound sources and an acoustical resonator will be based on existing models developed at the Gipsa-Lab. At first, analytical models accounting for the nature of the sound source and its interaction with the vocal tract walls will be considered. The next step will account for the nature of the airflow itself.

Within the framework of the EC « Eunison » project, some direct numerical simulations (Finite Elements) will be performed to complement this theoretical work.

Experimental aspects

Using data derived from MRI, casts of vocal tracts will be realised using a 3-D printer. Acoustical measurements will be made on these casts, inside an anechoic room, using a sound probe driven by a micrometric 3-D stage positioning system. The measured results will be then compared with the theoretical predictions. In order to mimic in vitro the generation of consonants, a vocal tract mechanical replica will be designed on the basis of an existing prototype. This mechanical replica will have a mobile part, controlled by a step-motor, in order to reproduce an articulation gesture when producing a consonant. Such a replica will therefore be able to dynamically produce constrictions (in the case of fricative consonants) or compete occlusions (in the case of plosives). Coupled with a self-oscillating vocal folds replica, voiced consonants will be considered as well. Measurements of localised pressure, of velocity fields (using Particle Imaging Velocity techniques) and some flow visualisations will be performed and compared to the theoretical predictions.

Application to Speech Synthesis

When validated against the in-vitro experiments, the theoretical models will be implemented in the numerical software for articulatory speech synthesis developed in the laboratory. The acoustical outputs (i.e. the synthetic speech) obtained using different theoretical models of increasing complexity (one-dimensional, three-dimensional, direct numerical simulations ...) will be compared and evaluated.

Location : Gipsa-Lab, University Campus, Grenoble Grant : EC project “Eunison” Duration : 3 years. Supervisors : Xavier Pelorson ([email protected]) and Annemie Van Hirtum ([email protected])