Related papers: Analysis of Disfluency in Children's Speech

Analysis of Disfluency in Children's Speech

URL: http://arxiv.org/abs/2010.04293v1
Date: Thu, 8 Oct 2020 22:51:25 GMT
Title: Analysis of Disfluency in Children's Speech
Authors: Trang Tran, Morgan Tinkler, Gary Yeung, Abeer Alwan, Mari Ostendorf
Abstract summary: We present a novel dataset with annotated disfluencies of spontaneous explanations from 26 children (ages 5--8) Children have higher disfluency and filler rates, tend to use nasal filled pauses more frequently, and on average exhibit longer reparandums than repairs. Despite the differences, an automatic disfluency detection system trained on adult (Switchboard) speech transcripts performs reasonably well on children's speech.
Score: 25.68434431663045
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Disfluencies are prevalent in spontaneous speech, as shown in many studies of adult speech. Less is understood about children's speech, especially in pre-school children who are still developing their language skills. We present a novel dataset with annotated disfluencies of spontaneous explanations from 26 children (ages 5--8), interviewed twice over a year-long period. Our preliminary analysis reveals significant differences between children's speech in our corpus and adult spontaneous speech from two corpora (Switchboard and CallHome). Children have higher disfluency and filler rates, tend to use nasal filled pauses more frequently, and on average exhibit longer reparandums than repairs, in contrast to adult speakers. Despite the differences, an automatic disfluency detection system trained on adult (Switchboard) speech transcripts performs reasonably well on children's speech, achieving an F1 score that is 10\% higher than the score on an adult out-of-domain dataset (CallHome).

Related papers

Evaluation of state-of-the-art ASR Models in Child-Adult Interactions [27.30130353688078]
Speech foundation models show a noticeable performance drop (15-20% absolute WER) for child speech compared to adult speech in the conversational setting. We employ LoRA on the best performing zero shot model (whisper-large) to probe the effectiveness of fine-tuning in a low resource setting.
arXiv Detail & Related papers (2024-09-24T14:42:37Z)
Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition [48.29355616574199]
We analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese, and Cantonese. This study concludes that different language and age groups require specific speech features, thus making cross-lingual inference an unsuitable method.
arXiv Detail & Related papers (2023-06-26T08:48:08Z)
DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction [50.51901599433536]
DisfluencyFixer is a tool that performs speech-to-speech disfluency correction in English and Hindi. Our proposed system removes disfluencies from input speech and returns fluent speech as output.
arXiv Detail & Related papers (2023-05-26T14:13:38Z)
Improving Children's Speech Recognition by Fine-tuning Self-supervised Adult Speech Representations [2.2191297646252646]
Children's speech recognition is a vital, yet largely overlooked domain when building inclusive speech technologies. Recent advances in self-supervised learning have created a new opportunity for overcoming this problem of data scarcity. We leverage self-supervised adult speech representations and use three well-known child speech corpora to build models for children's speech recognition.
arXiv Detail & Related papers (2022-11-14T22:03:36Z)
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation [94.80029087828888]
Direct speech-to-speech translation (S2ST) is an attractive research topic with many advantages compared to cascaded S2ST. Direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare. We propose in this paper a Speech2S model, which is jointly pre-trained with unpaired speech and bilingual text data for direct speech-to-speech translation tasks.
arXiv Detail & Related papers (2022-10-31T02:55:51Z)
How Adults Understand What Young Children Say [1.416276307599112]
Children's early speech often bears little resemblance to adult speech in form or content, and yet caregivers often find meaning in young children's utterances. We propose that successful early communication relies not just on children's growing linguistic knowledge, but also on adults' sophisticated inferences.
arXiv Detail & Related papers (2022-06-15T20:37:32Z)
Investigation of Data Augmentation Techniques for Disordered Speech Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition. Both normal and disordered speech were exploited in the augmentation process. The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z)
Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models. We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z)
Child-directed Listening: How Caregiver Inference Enables Children's Early Verbal Communication [2.9331097393290837]
We employ a suite of Bayesian models of spoken word recognition to understand how adults overcome the noisiness of child language. By evaluating competing models on phonetically-annotated corpora, we show that adults' recovered meanings are best predicted by prior expectations fitted specifically to the child language environment.
arXiv Detail & Related papers (2021-02-06T00:54:34Z)
"Notic My Speech" -- Blending Speech Patterns With Multimedia [65.91370924641862]
We propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding. Our proposed method outperformed the existing work by 4.99% in terms of the viseme error rate. We show that there is a strong correlation between our model's understanding of multi-view speech and the human perception.
arXiv Detail & Related papers (2020-06-12T06:51:55Z)
Learning to Understand Child-directed and Adult-directed Speech [18.29692441616062]
Human language acquisition research indicates that child-directed speech helps language learners. We compare the task performance of models trained on adult-directed speech (ADS) and child-directed speech (CDS) We find indications that CDS helps in the initial stages of learning, but eventually, models trained on ADS reach comparable task performance, and generalize better.
arXiv Detail & Related papers (2020-05-06T10:47:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.