Multilingual Dysarthric Speech Assessment Using Universal Phone Recognition and Language-Specific Phonemic Contrast Modeling
- URL: http://arxiv.org/abs/2601.21205v1
- Date: Thu, 29 Jan 2026 03:12:11 GMT
- Title: Multilingual Dysarthric Speech Assessment Using Universal Phone Recognition and Language-Specific Phonemic Contrast Modeling
- Authors: Eunjung Yeo, Julie M. Liss, Visar Berisha, David R. Mortensen,
- Abstract summary: The prevalence of neurological disorders associated with dysarthria motivates the need for automated intelligibility assessment methods.<n>We present a multilingual phoneme-production assessment framework that integrates universal phone recognition with language-specific phoneme interpretation.
- Score: 22.333214778384487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing prevalence of neurological disorders associated with dysarthria motivates the need for automated intelligibility assessment methods that are applicalbe across languages. However, most existing approaches are either limited to a single language or fail to capture language-specific factors shaping intelligibility. We present a multilingual phoneme-production assessment framework that integrates universal phone recognition with language-specific phoneme interpretation using contrastive phonological feature distances for phone-to-phoneme mapping and sequence alignment. The framework yields three metrics: phoneme error rate (PER), phonological feature error rate (PFER), and a newly proposed alignment-free measure, phoneme coverage (PhonCov). Analysis on English, Spanish, Italian, and Tamil show that PER benefits from the combination of mapping and alignment, PFER from alignment alone, and PhonCov from mapping. Further analyses demonstrate that the proposed framework captures clinically meaningful patterns of intelligibility degradation consistent with established observations of dysarthric speech.
Related papers
- Towards Inclusive Communication: A Unified Framework for Generating Spoken Language from Sign, Lip, and Audio [52.859261069569165]
We propose the first unified framework capable of handling diverse combinations of sign language, lip movements, and audio for spoken-language text generation.<n>We focus on three main objectives: (i) designing a unified, modality-agnostic architecture capable of effectively processing heterogeneous inputs; (ii) exploring the underexamined synergy among modalities, particularly the role of lip movements as non-manual cues in sign language comprehension; and (iii) achieving performance on par with or better than state-of-the-art models specialized for individual tasks.
arXiv Detail & Related papers (2025-08-28T06:51:42Z) - LASPA: Language Agnostic Speaker Disentanglement with Prefix-Tuned Cross-Attention [2.199918533021483]
The overlap between vocal traits such as accent, vocal anatomy, and a language's phonetic structure complicates separating linguistic and speaker information.<n>Disentangling these components can significantly improve speaker recognition accuracy.<n>We propose a novel disentanglement learning strategy that integrates joint learning through prefix-tuned cross-attention.
arXiv Detail & Related papers (2025-06-02T10:59:31Z) - Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically [58.019484208091534]
Cross-lingual alignment in pretrained language models (LMs) has enabled efficient transfer in text-based LMs.<n>It remains an open question whether findings and methods from text-based cross-lingual alignment apply to speech.
arXiv Detail & Related papers (2025-05-26T07:21:20Z) - Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities
on Multilingual Speech Recognition [31.575930914290762]
A novel data-driven approach is proposed to investigate the cross-lingual acoustic-phonetic similarities.
Deep neural networks are trained as mapping networks to transform the distributions from different acoustic models into a directly comparable form.
A relative improvement of 8% over monolingual counterpart is achieved.
arXiv Detail & Related papers (2022-07-07T15:55:41Z) - Differentiable Allophone Graphs for Language-Universal Speech
Recognition [77.2981317283029]
Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages.
We present a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings.
We build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language.
arXiv Detail & Related papers (2021-07-24T15:09:32Z) - Multilingual and crosslingual speech recognition using
phonological-vector based phone embeddings [20.93287944284448]
We propose to join phonology driven phone embedding (top-down) and deep neural network (DNN) based acoustic feature extraction (bottom-up) to calculate phone probabilities.
No inversion from acoustics to phonological features is required for speech recognition.
Experiments are conducted on the CommonVoice dataset (German, French, Spanish and Italian) and the AISHLL-1 dataset (Mandarin)
arXiv Detail & Related papers (2021-07-11T12:56:47Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z) - AlloVera: A Multilingual Allophone Database [137.3686036294502]
AlloVera provides mappings from 218 allophones to phonemes for 14 languages.
We show that a "universal" allophone model, Allosaurus, built with AlloVera, outperforms "universal" phonemic models and language-specific models on a speech-transcription task.
arXiv Detail & Related papers (2020-04-17T02:02:18Z) - Universal Phone Recognition with a Multilingual Allophone System [135.2254086165086]
We propose a joint model of language-independent phone and language-dependent phoneme distributions.
In multilingual ASR experiments over 11 languages, we find that this model improves testing performance by 2% phoneme error rate absolute.
Our recognizer achieves phone accuracy improvements of more than 17%, moving a step closer to speech recognition for all languages in the world.
arXiv Detail & Related papers (2020-02-26T21:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.