Beszédtudomány - Speech Science

Az izolált mandarin kínai lexikai tónusok akusztikai elemzése kínaiul tanuló magyar anyanyelvűek ejtésében

2025-12-08T08:46:26+00:00

The aim of the experiment is to provide an acoustic phonetic investigation on how Hungarian learners of Mandarin produce isolated Chinese lexical tones. In Mandarin Chinese (MC) four lexical tones are contrasted: high level Tone 1 (T1), rising Tone 2 (T2), low falling-rising Tone 3 (T3) and falling Tone 4 (T4). These four tones are not exclusively differ by their F0 curve, but their duration also serves as an acoustic cue at differentiation. The duration rises among tones in the following order: T4<T2<T1<T3. The primary focus of the study is to acoustically compare L2 learners and MC natives’ production by two acoustic characteristics: the duration of the lexical tones, as well as the shape of the f0 curves. The production of two L2 learner groups (beginners, advanced learners) were compared to a native MC control group (8 speakers per group, 24 speakers in total, all women). Speakers were asked to read CV structured meaningful words (ma syllables), characterized by the four analysed lexical tones. The acoustic analysis included the comparison of the duration of the vocalic section, as well as contrasting the individual tonal realizations’ f0 contours among the three speaker groups. The results show that both L2 learner groups produced Mandarin lexical tones with the same durational characteristics, with one exception: advanced learners produced T4 with a significantly shorter compared to the native pattern, possibly due to hyperarticulation. The production of the high level T1 did not pose any difficulies to neither L2 learner group, whereas beginners realized the T2 contour with a significantly lower minimal f0 leading to a steeper rising phase relative to the native curve. Advanced learners’ T2 was shaped identical with the native curve. As for T3, both L2 learner groups produced a compressed f0 range compared to MC native realizations. Last but not least, the domed f0 pattern of T4 posed problems to beginners, whose producion was characterized by a linear pattern instead of the convex curve observed in natives’ production. The significance of the study is that, to the author's knowledge, it is the first analysis that provides statistically validated results on the acoustic comparison of isolated lexical tone production in the production of Hungarian learners of Mandarin.

Az izé strukturális pozíciója és diskurzusjelölő funkciója spontán, baráti társalgásokban

2025-12-08T08:38:12+00:00

Hungarian native speakers often consider izé as a stigmatized, functionless filler word (cf. Grétsy & Kovalovszky, 1980), however it plays an important role in spontaneous conversations and speech planning processes. Previous literature has revealed that izé has two types, namely, word-substituting and time-gaining izés. While the former can treat word-retrieving problems and vocabulary shortages, the latter can help solving problems in speech planning (Fabulya, 2007; Gyarmathy, 2012; Gyarmathy & Neuberger, 2013; Gyarmathy, 2015; Kondacs, 2017; Marcsenkoné Kondacs, 2023). The study investigates the phenomenon from a new, conversation analytic perspective using both qualitative and quantitative methods. It examines the structural position of izé in Hungarian conversations. The corpus under study contains audio recordings of Hungarian spontaneous, naturally-occuring, friendly conversations collected and analyzed by the author. The analysis differentiates between five structural categories (cf. Németh, 2021), and the functions of word-substituting and time-gaining izés in these positions. It seeks to answer the question whether there are further functions beyond the ones identified so far. It examines whether izé could be used in a quotative function in the Hungarian corpus like the be + like formula in English. The analysis shows that izé can have a discourse-organizing, discourse marking function and it can influence the turn-taking system.

Difficulties in perception of Mandarin Chinese vowels [ɤ] and [ɿ]/[ʅ] by Hungarian learners of Mandarin

2025-12-08T09:04:08+00:00

This study examines the identification of Mandarin Chinese language (hereafter: Chinese) vowels [ɤ]/[ɿ]/[ʅ] by Hungarian learners of Chinese, as these vowels are neither part of the vowel phoneme system, nor appear as allophones in Hungarian language. Our aim is to investigate the perceptual patterns of these sounds and within this scope the identification of the mid from the high vowels at different learning stages. We also aim to explore the potential sources of the observed patterns.

21 beginners (in their first semester of learning Chinese) and 10 intermediate learners (in their fifth semester of learning Chinese) were studied. The beginner group was further divided into three sub-groups: (a) one of the groups is taught exclusively by a native Chinese speaker, (b) another group taught solely by Hungarian L2 speakers of Chinese, and (c) a third group is taught by both a native Chinese and a Hungarian L2-speaker of Chinese. The intermediate group was also taught by both teachers. Using an X(AB) identification test, we investigate the identification of Chinese vowels [ɤ] and [ɿ]/[ʅ] among Hungarian learners of Chinese.

The ratio of the correct identification of [ɤ] was found to be lower than the correct identification of the high vowels. The findings also suggested that including a native speaker as a teacher significantly improves learners’ perception. Finally, the advanced group did not perform better than the beginner groups, which is argued to be caused by longer orthographic input’s influence and the students’ memory.

The results indicate that the type of language experience (teacher) and the writing system together affect the perceptual identification.

Changes in the results of voice biometric systems using different technologies in case of different speech tasks and voice sample lengths

2025-12-08T09:53:52+00:00

Abstract
During forensic speaker comparison, the audio forensics expert appointed to perform the investigation works with audio recordings of different types and durations. Distinct speech samples and durations affect the probability data. In order to evaluate biometric identification results, the probability value of the data obtained must be determined so that the expert’s report can be accurate and interpreted by other actors in the public proceedings. In the present study, the speech samples of 78 speakers from the forensic voice sample database were compared within the framework of the FORENSICSpeech research project (Beke et al. 2020). The samples include three different types of speech: spontaneous, read, and narration speech. The recording of the samples was repeated after an average of two weeks, and then the audio files were cut into 20, 40, 60, 80, 100, and 120 seconds in duration using automatic editing. The aim of this study is to show how different speech styles and durations affect voice biometric identification results.
Results show that EER2 and FRR3, Cllr4 and Cllr-min values decrease with increasing duration, however, in the 20–120-second range, the change is not continuous. Similarly, the lowest EER, FRR, Cllr, and Cllr- min values occur in the case of spontaneous speech, followed by narration, while the speech samples of information exchange give the highest Cllr values. The data as a whole is characterized by the fact that the more advanced i-vector method tends to provide more efficient, lower error-rate person identification results.

1 Gaussian Mixture Model – Universal Background Model
2 Equal Error Rate
3 False Reject Rate
4 Cost Likelihood Ratio

Beszéddallam reprodukciója kottakép alapján

2025-12-08T09:05:31+00:00

A jelen dolgozatban közölt eljárás segítségével kottán ábrázolt beszéddallamot alkalmazunk természetes bemondáson. A módszer kidolgozását az motiválja, hogy Fónagy és Magdics (1967) sok száz, különböző érzelmet és attitűdöt tükröző beszéddallam kottaképét közölte, és szeretnénk megvizsgálni, hogy ezek a kottaképek mennyire érvényesek, használhatók-e gyakorlati célokra. A dolgozat bemutatja a Praat programban elvégzett műveleteket, amelyekkel hallhatóvá válnak a beszéddallamok. A későbbiekben percepciós kísérleteket tervezzük annak megállapítására, hogy a hallgatók képesek-e az így reprodukált dallamok alapján visszakódolni a Fónagy és Magdics szerint a dallam által kifejezett érzelmet, attitűdöt.