Detection of disfluencies in speech signal
Katarzyna Barczewska, Magdalena Igras
kbarczew[at]agh[dot]edu[dot]pl, migras[at]agh[dot]edu[dot]pl
27 September 2013
Abstract: During public presentations or interviews, speakers commonly and unconsciously abuse interjections or filled pauses that interfere with speech fluency and negatively affect listeners impression and speech perception. Types of disfluencies and methods of detection are reviewed. Authors carried out a survey which results indicated the most adverse elements for audience. The article presents an approach
to automatic detection of the most common type of disfluencies - filled pauses. A base of patterns of filled pauses (prolongated I, prolongated e, mm, Im, xmm, using SAMPA notation) was collected from 72 minutes of recordings of public presentations and interviews of six speakers (3 male, 3 female). Statistical analysis of length and frequency of occurrence of such interjections in recordings are presented. Then, each pattern from training set was described with mean values of first and second formants (F1 and F2). Detection was performed on test set of recordings by recognizing the phonemes using the two formants with efficiency of recognition about 68%. The results of research on disfluencies in speech detection may be applied in a system that analyzes speech and provides feedback of imperfections that occurred during speech in order to help in oratorical skills training. A conceptual prototype of such an application is proposed. Moreover, a base of patterns of most common disfluencies can be used in speech recognition systems to avoid interjections during speech-to-text transcription.

Keywords: speech processing, phoneme recognition, dynamics of speech, disfluencies of speech, elocution

Area: Biomedical Engineering


