Related papers: Inter-linguistic Phonetic Composition (IPC): A Theoretical and Computational Approach to Enhance Second Language Pronunciation

Inter-linguistic Phonetic Composition (IPC): A Theoretical and Computational Approach to Enhance Second Language Pronunciation

URL: http://arxiv.org/abs/2411.10927v1
Date: Sun, 17 Nov 2024 01:15:58 GMT
Title: Inter-linguistic Phonetic Composition (IPC): A Theoretical and Computational Approach to Enhance Second Language Pronunciation
Authors: Jisang Park, Minu Kim, DaYoung Hong, Jongha Lee,
Abstract summary: learners of a second language (L2) often unconsciously substitute unfamiliar L2 phonemes with similar phonemes from their native language (L1) This phonemic substitution leads to deviations from the standard phonological patterns of the L2. We propose Inter-linguistic Phonetic Composition (IPC), a novel computational method designed to minimize incorrect phonological transfer.
Score: 1.3024517678456733
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Learners of a second language (L2) often unconsciously substitute unfamiliar L2 phonemes with similar phonemes from their native language (L1), even though native speakers of the L2 perceive these sounds as distinct and non-interchangeable. This phonemic substitution leads to deviations from the standard phonological patterns of the L2, creating challenges for learners in acquiring accurate L2 pronunciation. To address this, we propose Inter-linguistic Phonetic Composition (IPC), a novel computational method designed to minimize incorrect phonological transfer by reconstructing L2 phonemes as composite sounds derived from multiple L1 phonemes. Tests with two automatic speech recognition models demonstrated that when L2 speakers produced IPC-generated composite sounds, the recognition rate of target L2 phonemes improved by 20% compared to when their pronunciation was influenced by original phonological transfer patterns. The improvement was observed within a relatively shorter time frame, demonstrating rapid acquisition of the composite sound.

Related papers

FROST-EMA: Finnish and Russian Oral Speech Dataset of Electromagnetic Articulography Measurements with L1, L2 and Imitated L2 Accents [44.93009303381237]
We introduce a new FROST-EMA (Finnish and Russian Oral Speech dataset of Electromagnetic Articulography) corpus.<n>It consists of 18 bilingual speakers, who produced speech in their native language (L1), second language (L2) and imitated L2 (fake foreign accent)<n>The new corpus enables research into language variability from phonetic and technological points of view.
arXiv Detail & Related papers (2025-06-10T16:52:11Z)
LLM-based phoneme-to-grapheme for phoneme-based speech recognition [11.552927239284582]
We propose phoneme-to-grapheme (LLM-P2G) decoding for phoneme-based automatic speech recognition (ASR)<n>Our experimental results show that LLM-P2G outperforms WFST-based systems in crosslingual ASR for Polish and German, by relative WER reductions of 3.6% and 6.9% respectively.
arXiv Detail & Related papers (2025-06-05T07:35:55Z)
Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically [58.019484208091534]
Cross-lingual alignment in pretrained language models (LMs) has enabled efficient transfer in text-based LMs.<n>It remains an open question whether findings and methods from text-based cross-lingual alignment apply to speech.
arXiv Detail & Related papers (2025-05-26T07:21:20Z)
A Pilot Study of Applying Sequence-to-Sequence Voice Conversion to Evaluate the Intelligibility of L2 Speech Using a Native Speaker's Shadowings [12.29892010056753]
An ideal form of feedback for L2 speakers should be so fine-grained that it enables them to detect and diagnose unintelligible parts of utterances. This pilot study utilizes a unique semi-parallel dataset composed of non-native speakers' (L2) reading aloud, shadowing of native speakers (L1) and their script-shadowing utterances. We explore the technical possibility of replicating the process of an L1 speaker's shadowing L2 speech using Voice Conversion techniques, to create a virtual shadower system.
arXiv Detail & Related papers (2024-10-03T06:24:56Z)
PhonologyBench: Evaluating Phonological Skills of Large Language Models [57.80997670335227]
Phonology, the study of speech's structure and pronunciation rules, is a critical yet often overlooked component in Large Language Model (LLM) research. We present PhonologyBench, a novel benchmark consisting of three diagnostic tasks designed to explicitly test the phonological skills of LLMs. We observe a significant gap of 17% and 45% on Rhyme Word Generation and Syllable counting, respectively, when compared to humans.
arXiv Detail & Related papers (2024-04-03T04:53:14Z)
L1-aware Multilingual Mispronunciation Detection Framework [10.15106073866792]
This paper introduces a novel multilingual MDD architecture, L1-MultiMDD, enriched with L1-aware speech representation. An end-to-end speech encoder is trained on the input signal and its corresponding reference phoneme sequence. Experiments demonstrate the effectiveness of the proposed L1-MultiMDD framework on both seen -- L2-ARTIC, LATIC, and AraVoiceL2v2; and unseen -- EpaDB and Speechocean762 datasets.
arXiv Detail & Related papers (2023-09-14T13:53:17Z)
BiPhone: Modeling Inter Language Phonetic Influences in Text [12.405907573933378]
A large number of people are forced to use the Web in a language they have low literacy in due to technology asymmetries. Written text in the second language (L2) from such users often contains a large number of errors that are influenced by their native language (L1) We propose a method to mine phoneme confusions (sounds in L2 that an L1 speaker is likely to conflate) for pairs of L1 and L2. These confusions are then plugged into a generative model (Bi-Phone) for synthetically producing corrupted L2 text.
arXiv Detail & Related papers (2023-07-06T22:31:55Z)
Incorporating L2 Phonemes Using Articulatory Features for Robust Speech Recognition [2.8360662552057323]
This study is on the efficient incorporation of the L2 phonemes, which in this work refer to Korean phonemes, through articulatory feature analysis. We employ the lattice-free maximum mutual information (LF-MMI) objective in an end-to-end manner, to train the acoustic model to align and predict one of multiple pronunciation candidates. Experimental results show that the proposed method improves ASR accuracy for Korean L2 speech by training solely on L1 speech data.
arXiv Detail & Related papers (2023-06-05T01:55:33Z)
On Prosody Modeling for ASR+TTS based Voice Conversion [82.65378387724641]
In voice conversion, an approach showing promising results in the latest voice conversion challenge (VCC) 2020 is to first use an automatic speech recognition (ASR) model to transcribe the source speech into the underlying linguistic contents. Such a paradigm, referred to as ASR+TTS, overlooks the modeling of prosody, which plays an important role in speech naturalness and conversion similarity. We propose to directly predict prosody from the linguistic representation in a target-speaker-dependent manner, referred to as target text prediction (TTP)
arXiv Detail & Related papers (2021-07-20T13:30:23Z)
Weakly-supervised word-level pronunciation error detection in non-native English speech [14.430965595136149]
weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. Phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. Compared to state-of-the-art approaches, we improve the accuracy of detecting word-level pronunciation errors in AUC metric by 30% on the GUT Isle Corpus of L2 Polish speakers, and by 21.5% on the Isle Corpus of L2 German and Italian speakers.
arXiv Detail & Related papers (2021-06-07T10:31:53Z)
Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages [51.0542215642794]
We propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification. We present results for two languages families - Indic languages and Romance languages, for two different intent recognition tasks.
arXiv Detail & Related papers (2020-11-07T00:35:31Z)
Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion [60.808838088376675]
We propose a VC system with explicit prosodic modelling and deep speaker embedding learning. A prosody corrector takes in phoneme embeddings to infer typical phoneme duration and pitch values. A conversion model takes phoneme embeddings and typical prosody features as inputs to generate the converted speech.
arXiv Detail & Related papers (2020-11-03T13:08:53Z)
AlloVera: A Multilingual Allophone Database [137.3686036294502]
AlloVera provides mappings from 218 allophones to phonemes for 14 languages. We show that a "universal" allophone model, Allosaurus, built with AlloVera, outperforms "universal" phonemic models and language-specific models on a speech-transcription task.
arXiv Detail & Related papers (2020-04-17T02:02:18Z)
Universal Phone Recognition with a Multilingual Allophone System [135.2254086165086]
We propose a joint model of language-independent phone and language-dependent phoneme distributions. In multilingual ASR experiments over 11 languages, we find that this model improves testing performance by 2% phoneme error rate absolute. Our recognizer achieves phone accuracy improvements of more than 17%, moving a step closer to speech recognition for all languages in the world.
arXiv Detail & Related papers (2020-02-26T21:28:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.