Error-preserving Automatic Speech Recognition of Young English Learners' Language
- URL: http://arxiv.org/abs/2406.03235v1
- Date: Wed, 5 Jun 2024 13:15:37 GMT
- Title: Error-preserving Automatic Speech Recognition of Young English Learners' Language
- Authors: Janick Michot, Manuela Hürlimann, Jan Deriu, Luzia Sauer, Katsiaryna Mlynchyk, Mark Cieliebak,
- Abstract summary: One of the central skills that language learners need to practice is speaking the language.
Recent advances in speech technology and natural language processing allow for the creation of novel tools to practice their speaking skills.
We build an ASR system that works on spontaneous speech by young language learners and preserves their errors.
- Score: 6.491559928368298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the central skills that language learners need to practice is speaking the language. Currently, students in school do not get enough speaking opportunities and lack conversational practice. Recent advances in speech technology and natural language processing allow for the creation of novel tools to practice their speaking skills. In this work, we tackle the first component of such a pipeline, namely, the automated speech recognition module (ASR), which faces a number of challenges: first, state-of-the-art ASR models are often trained on adult read-aloud data by native speakers and do not transfer well to young language learners' speech. Second, most ASR systems contain a powerful language model, which smooths out errors made by the speakers. To give corrective feedback, which is a crucial part of language learning, the ASR systems in our setting need to preserve the errors made by the language learners. In this work, we build an ASR system that satisfies these requirements: it works on spontaneous speech by young language learners and preserves their errors. For this, we collected a corpus containing around 85 hours of English audio spoken by learners in Switzerland from grades 4 to 6 on different language learning tasks, which we used to train an ASR model. Our experiments show that our model benefits from direct fine-tuning on children's voices and has a much higher error preservation rate than other models.
Related papers
- Multilingual self-supervised speech representations improve the speech
recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%.
In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z) - Automatic Speech Recognition of Non-Native Child Speech for Language
Learning Applications [18.849741353784328]
We assess the performance of two state-of-the-art ASR systems, Wav2Vec2.0 and Whisper AI.
We evaluate their performance on read and extemporaneous speech of native and non-native Dutch children.
arXiv Detail & Related papers (2023-06-29T06:14:26Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - A Survey of Multilingual Models for Automatic Speech Recognition [6.657361001202456]
Cross-lingual transfer is an attractive solution to the problem of multilingual Automatic Speech Recognition.
Recent advances in Self Supervised Learning are opening up avenues for unlabeled speech data to be used in multilingual ASR models.
We present best practices for building multilingual models from research across diverse languages and techniques.
arXiv Detail & Related papers (2022-02-25T09:31:40Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Continual-wav2vec2: an Application of Continual Learning for
Self-Supervised Automatic Speech Recognition [0.23872611575805824]
We present a method for continual learning of speech representations for multiple languages using self-supervised learning (SSL)
Wav2vec models perform SSL on raw audio in a pretraining phase and then finetune on a small fraction of annotated data.
We use ideas from continual learning to transfer knowledge from a previous task to speed up pretraining a new language task.
arXiv Detail & Related papers (2021-07-26T10:39:03Z) - How Phonotactics Affect Multilingual and Zero-shot ASR Performance [74.70048598292583]
A Transformer encoder-decoder model has been shown to leverage multilingual data well in IPA transcriptions of languages presented during training.
We replace the encoder-decoder with a hybrid ASR system consisting of a separate AM and LM.
We show that the gain from modeling crosslingual phonotactics is limited, and imposing a too strong model can hurt the zero-shot transfer.
arXiv Detail & Related papers (2020-10-22T23:07:24Z) - LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition [148.43282526983637]
We develop LRSpeech, a TTS and ASR system for languages with low data cost.
We conduct experiments on an experimental language (English) and a truly low-resource language (Lithuanian) to verify the effectiveness of LRSpeech.
We are currently deploying LRSpeech into a commercialized cloud speech service to support TTS on more rare languages.
arXiv Detail & Related papers (2020-08-09T08:16:33Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.