Enhancing nonnative speech perception and production through an AI-powered application
- URL: http://arxiv.org/abs/2503.22705v1
- Date: Tue, 18 Mar 2025 10:05:12 GMT
- Title: Enhancing nonnative speech perception and production through an AI-powered application
- Authors: Georgios P. Georgiou,
- Abstract summary: The aim of this study is to examine the impact of training with an AI-powered mobile application on nonnative sound perception and production.<n>The intervention involved training with the Speakometer mobile application, which incorporated recording tasks featuring the English vowels, along with pronunciation feedback and practice.<n>The results revealed significant improvements in both discrimination accuracy and production of the target contrast following the intervention.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While research on using Artificial Intelligence (AI) through various applications to enhance foreign language pronunciation is expanding, it has primarily focused on aspects such as comprehensibility and intelligibility, largely neglecting the improvement of individual speech sounds in both perception and production. This study seeks to address this gap by examining the impact of training with an AI-powered mobile application on nonnative sound perception and production. Participants completed a pretest assessing their ability to discriminate the second language English heed-hid contrast and produce these vowels in sentence contexts. The intervention involved training with the Speakometer mobile application, which incorporated recording tasks featuring the English vowels, along with pronunciation feedback and practice. The posttest mirrored the pretest to measure changes in performance. The results revealed significant improvements in both discrimination accuracy and production of the target contrast following the intervention. However, participants did not achieve native-like competence. These findings highlight the effectiveness of AI-powered applications in facilitating speech acquisition and support their potential use for personalized, interactive pronunciation training beyond the classroom.
Related papers
- "It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services [3.8931913630405393]
This study evaluates two synthetic AI voice services (Speechify and ElevenLabs) through a mixed methods approach.
Our findings reveal technical performance disparities across five regional, English-language accents.
Current speech generation technologies may inadvertently reinforce linguistic privilege and accent-based discrimination.
arXiv Detail & Related papers (2025-04-12T21:31:22Z) - Listening for Expert Identified Linguistic Features: Assessment of Audio Deepfake Discernment among Undergraduate Students [0.0]
This paper evaluates the impact of training undergraduate students to improve their audio deepfake discernment ability by listening for expert-defined linguistic features.
Our research goes beyond informational training by introducing targeted linguistic cues to listeners as a deepfake discernment mechanism.
Findings show that the experimental group showed a statistically significant decrease in their unsurety when evaluating audio clips and an improvement in their ability to correctly identify clips they were initially unsure about.
arXiv Detail & Related papers (2024-11-21T20:52:02Z) - Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems [55.99999020778169]
We study a function that can predict the forthcoming words and estimate the time remaining until the end of an utterance.
We develop a cross-attention-based algorithm that incorporates both acoustic and linguistic information.
Results demonstrate the proposed model's ability to predict upcoming words and estimate future EOU events up to 300ms prior to the actual EOU.
arXiv Detail & Related papers (2024-09-30T06:29:58Z) - Learning Through AI-Clones: Enhancing Self-Perception and Presentation Performance [7.151400656424202]
A mixed-design experiment with 44 international students compared self-recording videos (self-recording group) to AI-clone videos (AI-clone group) for online English presentation practice.<n>Results showed that AI clones functioned as positive "role models" for facilitating social comparisons.
arXiv Detail & Related papers (2023-10-23T17:20:08Z) - High-Quality Automatic Voice Over with Accurate Alignment: Supervision
through Self-Supervised Discrete Speech Units [69.06657692891447]
We propose a novel AVO method leveraging the learning objective of self-supervised discrete speech unit prediction.
Experimental results show that our proposed method achieves remarkable lip-speech synchronization and high speech quality.
arXiv Detail & Related papers (2023-06-29T15:02:22Z) - Incorporating L2 Phonemes Using Articulatory Features for Robust Speech
Recognition [2.8360662552057323]
This study is on the efficient incorporation of the L2 phonemes, which in this work refer to Korean phonemes, through articulatory feature analysis.
We employ the lattice-free maximum mutual information (LF-MMI) objective in an end-to-end manner, to train the acoustic model to align and predict one of multiple pronunciation candidates.
Experimental results show that the proposed method improves ASR accuracy for Korean L2 speech by training solely on L1 speech data.
arXiv Detail & Related papers (2023-06-05T01:55:33Z) - An Exploration of Self-Supervised Pretrained Representations for
End-to-End Speech Recognition [98.70304981174748]
We focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models.
We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.
arXiv Detail & Related papers (2021-10-09T15:06:09Z) - Structural Pre-training for Dialogue Comprehension [51.215629336320305]
We present SPIDER, Structural Pre-traIned DialoguE Reader, to capture dialogue exclusive features.
To simulate the dialogue-like features, we propose two training objectives in addition to the original LM objectives.
Experimental results on widely used dialogue benchmarks verify the effectiveness of the newly introduced self-supervised tasks.
arXiv Detail & Related papers (2021-05-23T15:16:54Z) - UniSpeech: Unified Speech Representation Learning with Labeled and
Unlabeled Data [54.733889961024445]
We propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data.
We evaluate the effectiveness of UniSpeech for cross-lingual representation learning on public CommonVoice corpus.
arXiv Detail & Related papers (2021-01-19T12:53:43Z) - Visual-speech Synthesis of Exaggerated Corrective Feedback [32.88905525975493]
We propose a method for exaggerated visual-speech feedback in computer-assisted pronunciation training (CAPT)
The speech exaggeration is realized by an emphatic speech generation neural network based on Tacotron.
We show that exaggerated feedback outperforms non-exaggerated version on helping learners with pronunciation identification and pronunciation improvement.
arXiv Detail & Related papers (2020-09-12T08:37:22Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.