Related papers: Building a Non-native Speech Corpus Featuring Chinese-English Bilingual Children: Compilation and Rationale

Building a Non-native Speech Corpus Featuring Chinese-English Bilingual Children: Compilation and Rationale

URL: http://arxiv.org/abs/2305.00446v2
Date: Sun, 7 Jan 2024 17:17:00 GMT
Title: Building a Non-native Speech Corpus Featuring Chinese-English Bilingual Children: Compilation and Rationale
Authors: Hiuchung Hung, Andreas Maier, Thorsten Piske
Abstract summary: This paper introduces a non-native speech corpus consisting of narratives from fifty 5- to 6-year-old Chinese-English children. Transcripts totaling 6.5 hours of children taking a narrative comprehension test in English (L2) are presented, along with human-rated scores and annotations of grammatical and pronunciation errors. The children also completed the parallel MAIN tests in Chinese (L1) for reference purposes.
Score: 3.924235219960689
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper introduces a non-native speech corpus consisting of narratives from fifty 5- to 6-year-old Chinese-English children. Transcripts totaling 6.5 hours of children taking a narrative comprehension test in English (L2) are presented, along with human-rated scores and annotations of grammatical and pronunciation errors. The children also completed the parallel MAIN tests in Chinese (L1) for reference purposes. For all tests we recorded audio and video with our innovative self-developed remote collection methods. The video recordings serve to mitigate the challenge of low intelligibility in L2 narratives produced by young children during the transcription process. This corpus offers valuable resources for second language teaching and has the potential to enhance the overall performance of automatic speech recognition (ASR).

Related papers

Speech Recognition for Automatically Assessing Afrikaans and isiXhosa Preschool Oral Narratives [15.669164862460342]
We develop automatic speech recognition systems for stories told by Afrikaans and isiXhosa preschool children. We consider a range of prior child-speech ASR strategies to determine which is best suited to this unique setting.
arXiv Detail & Related papers (2025-01-11T08:11:09Z)
Speak & Improve Corpus 2025: an L2 English Speech Corpus for Language Assessment and Feedback [28.53752312060031]
Speak & Improve Corpus 2025 is a dataset of L2 learner English data with holistic scores and language error annotation. The aim of the corpus release is to address a major challenge to developing L2 spoken language processing systems. It is being made available for non-commercial use on the ELiT website.
arXiv Detail & Related papers (2024-12-16T17:07:26Z)
Who Said What? An Automated Approach to Analyzing Speech in Preschool Classrooms [0.4207829324073153]
We propose an automated framework that uses software to classify speakers and to transcribe their utterances. We compare results from our framework to those from a human expert for 110 minutes of classroom recordings. The results suggest substantial progress in analyzing classroom speech that may support children's language development.
arXiv Detail & Related papers (2024-01-14T18:27:37Z)
Teacher Perception of Automatically Extracted Grammar Concepts for L2 Language Learning [66.79173000135717]
We apply this work to teaching two Indian languages, Kannada and Marathi, which do not have well-developed resources for second language learning. We extract descriptions from a natural text corpus that answer questions about morphosyntax (learning of word order, agreement, case marking, or word formation) and semantics (learning of vocabulary). We enlist the help of language educators from schools in North America to perform a manual evaluation, who find the materials have potential to be used for their lesson preparation and learner evaluation.
arXiv Detail & Related papers (2023-10-27T18:17:29Z)
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks [61.3055230762097]
We propose a decoder-only language model, VoxtLM, that can perform four tasks: speech recognition, speech synthesis, text generation, and speech continuation. VoxtLM integrates text vocabulary with discrete speech tokens from self-supervised speech features and uses special tokens to enable multitask learning.
arXiv Detail & Related papers (2023-09-14T03:13:18Z)
Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications [18.849741353784328]
We assess the performance of two state-of-the-art ASR systems, Wav2Vec2.0 and Whisper AI. We evaluate their performance on read and extemporaneous speech of native and non-native Dutch children.
arXiv Detail & Related papers (2023-06-29T06:14:26Z)
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels. We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels. We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z)
Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples [89.16814518860357]
The objective of this work is to explore the learning of visually grounded speech models (VGS) from multilingual perspective. Our key contribution in this work is to leverage the power of a high-resource language in a bilingual visually grounded speech model to improve the performance of a low-resource language.
arXiv Detail & Related papers (2023-03-30T16:34:10Z)
Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages [5.17905382659474]
Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies. This paper describes the challenges in regenerating English lecture videos in Indian languages semi-automatically.
arXiv Detail & Related papers (2022-11-01T07:06:29Z)
Video-Guided Curriculum Learning for Spoken Video Grounding [65.49979202728167]
We introduce a new task, spoken video grounding (SVG), which aims to localize the desired video fragments from spoken language descriptions. To rectify the discriminative phonemes and extract video-related information from noisy audio, we develop a novel video-guided curriculum learning (VGCL) In addition, we collect the first large-scale spoken video grounding dataset based on ActivityNet.
arXiv Detail & Related papers (2022-09-01T07:47:01Z)
Watch and Learn: Mapping Language and Noisy Real-world Videos with Self-supervision [54.73758942064708]
We teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations. For training and evaluation, we contribute a new dataset ApartmenTour' that contains a large number of online videos and subtitles.
arXiv Detail & Related papers (2020-11-19T03:43:56Z)
Analysis of Disfluency in Children's Speech [25.68434431663045]
We present a novel dataset with annotated disfluencies of spontaneous explanations from 26 children (ages 5--8) Children have higher disfluency and filler rates, tend to use nasal filled pauses more frequently, and on average exhibit longer reparandums than repairs. Despite the differences, an automatic disfluency detection system trained on adult (Switchboard) speech transcripts performs reasonably well on children's speech.
arXiv Detail & Related papers (2020-10-08T22:51:25Z)
TLT-school: a Corpus of Non Native Children Speech [7.417312533172291]
This paper describes "TLT-school" a corpus of speech utterances collected in schools of northern Italy for assessing the performance of students learning both English and German. The corpus was recorded in the years 2017 and 2018 from students aged between nine and sixteen years, attending primary, middle and high school. All utterances have been scored, in terms of some predefined proficiency indicators, by human experts.
arXiv Detail & Related papers (2020-01-22T15:14:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.