Fugu-MT 論文翻訳(概要): Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations

論文の概要: Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations

arxiv url: http://arxiv.org/abs/2604.10123v1
Date: Sat, 11 Apr 2026 09:38:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:15.855458
Title: Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations
Title（参考訳）: 自己教師付き音声表現における音韻部分空間解析による言語横断性難聴度評価
Authors: Bernard Muller, Antonio Armando Ortiz Barrañón, LaVonne Roberts,
Abstract要約: 変形性言語重度評価は通常、ラベル付き病的音声から構築された教師付きモデルを必要とする。音韻的特徴部分空間の劣化を測定することにより, 難聴度を定量化する訓練自由手法を提案する。教師付き重度モデルは訓練されず、健常な制御音声から特徴方向を推定する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dysarthric speech severity assessment typically requires trained clinicians or supervised models built from labelled pathological speech, limiting scalability across languages and clinical settings. We present a training-free method that quantifies dysarthria severity by measuring degradation in phonological feature subspaces within frozen HuBERT representations. No supervised severity model is trained; feature directions are estimated from healthy control speech using a pretrained forced aligner. For each speaker, we extract phone-level embeddings via Montreal Forced Aligner, compute d-prime scores along phonological contrast directions (nasality, voicing, stridency, sonorance, manner, and four vowel features) derived exclusively from healthy controls, and construct a 12-dimensional phonological profile.Evaluating 890 speakers across 10 corpora, 5 languages (English, Spanish, Dutch, Mandarin, French), and 3 primary aetiologies (Parkinson's disease, cerebral palsy, ALS), we find that all five consonant d-prime features correlate significantly with clinical severity (random-effects meta-analysis rho = -0.50 to -0.56, p < 2e-4; pooled Spearman rho = -0.47 to -0.55 with bootstrap 95% CIs not crossing zero). The effect replicates within individual corpora, survives FDR correction, and remains robust to leave-one-corpus-out removal and alignment quality controls. Nasality d-prime decreases monotonically from control to severe in 6 of 7 severity-graded corpora. Mann-Whitney U tests confirm that all 12 features distinguish controls from severely dysarthric speakers (p < 0.001).The method requires no dysarthric training data and applies to any language with an existing MFA acoustic model (currently 29 languages). We release the full pipeline and phone feature configurations for six languages.
Abstract（参考訳）: 外科的言語重症度評価は通常、ラベル付き病理言語から構築された訓練された臨床医または監督されたモデルを必要とし、言語と臨床設定のスケーラビリティを制限する。凍結したHuBERT表現内の音韻特徴部分空間の劣化を測定することにより、難聴度を定量化する訓練自由手法を提案する。教師付き重度モデルは訓練されず、事前訓練された強制整合器を用いて、健全な制御音声から特徴方向を推定する。各話者に対して、モントリオール強制アリグナーを介して、音声学的コントラスト方向(鼻音、発声、強勢、ソノランス、方法、母音の4つの特徴)に沿って、d-primeスコアを計算し、健全な制御から派生した12次元の音韻プロファイルを構築します。この効果は、個々のコーパス内で複製し、FDR補正を継続し、コーパスアウト除去とアライメント品質制御を継続する。ナサリティd-プリムは、重度グレードの7つのコーパスのうち6つにおいて、単調に制御から重度へと減少する。マン=ホイットニーUの試験では、12の全ての特徴が重度変形性スピーカー(p < 0.001)とコントロールを区別していることが確認された。既存のMFA音響モデル(現在は29言語)を持つあらゆる言語に適用できる。 6つの言語のための完全なパイプラインと電話機能構成をリリースします。

論文の概要: Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations

関連論文リスト