DDSupport: Language Learning Support System that Displays Differences
and Distances from Model Speech
- URL: http://arxiv.org/abs/2212.04930v1
- Date: Thu, 8 Dec 2022 05:49:15 GMT
- Title: DDSupport: Language Learning Support System that Displays Differences
and Distances from Model Speech
- Authors: Kazuki Kawamura, Jun Rekimoto
- Abstract summary: We propose a new language learning support system that calculates speech scores and detects mispronunciations by beginners.
The proposed system uses deep learning--based speech processing to display the pronunciation score of the learner's speech and the difference/distance between the learner's and a group of models' pronunciation.
- Score: 16.82591185507251
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: When beginners learn to speak a non-native language, it is difficult for them
to judge for themselves whether they are speaking well. Therefore,
computer-assisted pronunciation training systems are used to detect learner
mispronunciations. These systems typically compare the user's speech with that
of a specific native speaker as a model in units of rhythm, phonemes, or words
and calculate the differences. However, they require extensive speech data with
detailed annotations or can only compare with one specific native speaker. To
overcome these problems, we propose a new language learning support system that
calculates speech scores and detects mispronunciations by beginners based on a
small amount of unannotated speech data without comparison to a specific
person. The proposed system uses deep learning--based speech processing to
display the pronunciation score of the learner's speech and the
difference/distance between the learner's and a group of models' pronunciation
in an intuitively visual manner. Learners can gradually improve their
pronunciation by eliminating differences and shortening the distance from the
model until they become sufficiently proficient. Furthermore, since the
pronunciation score and difference/distance are not calculated compared to
specific sentences of a particular model, users are free to study the sentences
they wish to study. We also built an application to help non-native speakers
learn English and confirmed that it can improve users' speech intelligibility.
Related papers
- Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach [14.5696754689252]
Recent progress in Spoken Language Modeling has shown that learning language directly from speech is feasible.
We show that fine-tuning speech representation models on phoneme classification leads to more context-invariant representations.
arXiv Detail & Related papers (2024-09-16T10:29:15Z) - Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation [55.15299351110525]
This paper explores sentence-level multilingual Visual Speech Recognition (VSR) that can recognize different languages with a single trained model.
We propose a novel training strategy, processing with visual speech units.
We set new state-of-the-art multilingual VSR performances by achieving comparable performances to the previous language-specific VSR models.
arXiv Detail & Related papers (2024-01-18T08:46:02Z) - Multilingual self-supervised speech representations improve the speech
recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%.
In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z) - Exploring Speech Recognition, Translation, and Understanding with
Discrete Speech Units: A Comparative Study [68.88536866933038]
Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies.
Recent investigations proposed the use of discrete speech units derived from self-supervised learning representations.
Applying various methods, such as de-duplication and subword modeling, can further compress the speech sequence length.
arXiv Detail & Related papers (2023-09-27T17:21:13Z) - A transfer learning based approach for pronunciation scoring [7.98890440106366]
Phone-level pronunciation scoring is a challenging task, with performance far from that of human annotators.
Standard systems generate a score for each phone in a phrase using models trained for automatic speech recognition (ASR) with native data only.
We present a transfer learning-based approach that leverages a model trained for ASR, adapting it for the task of pronunciation scoring.
arXiv Detail & Related papers (2021-11-01T14:37:06Z) - Analysis of French Phonetic Idiosyncrasies for Accent Recognition [0.8602553195689513]
Differences in pronunciation, in accent and intonation of speech in general, create one of the most common problems of speech recognition.
We use traditional machine learning techniques and convolutional neural networks, and show that the classical techniques are not sufficiently efficient to solve this problem.
In this paper, we focus our attention on the French accent. We also identify its limitation by understanding the impact of French idiosyncrasies on its spectrograms.
arXiv Detail & Related papers (2021-10-18T10:50:50Z) - Mandarin-English Code-switching Speech Recognition with Self-supervised
Speech Representation Models [55.82292352607321]
Code-switching (CS) is common in daily conversations where more than one language is used within a sentence.
This paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS.
arXiv Detail & Related papers (2021-10-07T14:43:35Z) - English Accent Accuracy Analysis in a State-of-the-Art Automatic Speech
Recognition System [3.4888132404740797]
We evaluate a state-of-the-art automatic speech recognition model, using unseen data from a corpus with a wide variety of labeled English accents.
We show that there is indeed an accuracy bias in terms of accentual variety, favoring the accents most prevalent in the training corpus.
arXiv Detail & Related papers (2021-05-09T08:24:33Z) - Cross-lingual hate speech detection based on multilingual
domain-specific word embeddings [4.769747792846004]
We propose to address the problem of multilingual hate speech detection from the perspective of transfer learning.
Our goal is to determine if knowledge from one particular language can be used to classify other language.
We show that the use of our simple yet specific multilingual hate representations improves classification results.
arXiv Detail & Related papers (2021-04-30T02:24:50Z) - UniSpeech: Unified Speech Representation Learning with Labeled and
Unlabeled Data [54.733889961024445]
We propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data.
We evaluate the effectiveness of UniSpeech for cross-lingual representation learning on public CommonVoice corpus.
arXiv Detail & Related papers (2021-01-19T12:53:43Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.