Verification of Articulatory Phonetics Features with Quantitative Data

  • Réka Trencsényi Debreceni Egyetem
  • László Czap
Keywords: quantitative tongue description; articulatory phonetics; place of articulation; talking head; viseme features

Abstract

This paper aims to verify the phonetic features of articulation by quantitative data, whereby it becomes possible to determine the base data set of visemes – the visual counterparts of phonemes – with quantitative data in order to provide accurate input for visual speech synthesis (a talking head that supports the training of speech production of deaf and hard of hearing children). Measurement-based features extend the existing data and refine our previously used dynamic model of articulation. This endeavour requires the definition of two major types of data simultaneously: 1.) Information connected to the shape of the mouth, which can be examined relatively simply in an ordinary camera image. 2.) Parameters describing the position of the tongue, gaining of which requires the use of medical-level imaging devices and the processing of their signals. The place of articulation of sounds can be described by the shape and position of the tongue. In the case of vowels, we estimated the tongue position with the centroid of the tongue, while in the case of consonants, we define the place of articulation with the measured distance of the tongue from the palate. In our examinations, we use dynamic MRI images and determine the relevant tongue contours by running automatic algorithms. On the track of our analysis, such a data set is created that statically defines the articulatory key frames (fixing the tongue position belonging purely to the given speech sound, without the properties of sound transitions) playing an important role in visual speech synthesis. We apply our results to improve the existing Hungarian transparent talking head with a more accurate model based on the clarification of the dynamic features. This includes the temporal tracking of articulatory features, and the implementation of interpolation between key frames, which facilitates the description of coarticulatory effects.

Published
2022-01-29