Fugu-MT 論文翻訳(概要): The Sound of Syntax: Finetuning and Comprehensive Evaluation of Language Models for Speech Pathology

論文の概要: The Sound of Syntax: Finetuning and Comprehensive Evaluation of Language Models for Speech Pathology

arxiv url: http://arxiv.org/abs/2509.16765v1
Date: Sat, 20 Sep 2025 18:10:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-23 18:58:15.963557
Title: The Sound of Syntax: Finetuning and Comprehensive Evaluation of Language Models for Speech Pathology
Title（参考訳）: 構文の音:音声病理における言語モデルの微細化と包括的評価
Authors: Fagun Patel, Duc Q. Nguyen, Sang T. Truong, Jody Vaynshtok, Sanmi Koyejo, Nick Haber,
Abstract要約: 340万人以上の子供が、臨床介入を必要とする言語障害を経験している。言語病理医(SLP)の数は、患児の約20倍である。
参考スコア（独自算出の注目度）: 28.33400979049354
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: According to the U.S. National Institutes of Health, more than 3.4 million children experience speech disorders that require clinical intervention. The number of speech-language pathologists (SLPs) is roughly 20 times fewer than the number of affected children, highlighting a significant gap in children's care and a pressing need for technological support that improves the productivity of SLPs. State-of-the-art multimodal language models (MLMs) show promise for supporting SLPs, but their use remains underexplored largely due to a limited understanding of their performance in high-stakes clinical settings. To address this gap, we collaborate with domain experts to develop a taxonomy of real-world use cases of MLMs in speech-language pathologies. Building on this taxonomy, we introduce the first comprehensive benchmark for evaluating MLM across five core use cases, each containing 1,000 manually annotated data points. This benchmark includes robustness and sensitivity tests under various settings, including background noise, speaker gender, and accent. Our evaluation of 15 state-of-the-art MLMs reveals that no single model consistently outperforms others across all tasks. Notably, we find systematic disparities, with models performing better on male speakers, and observe that chain-of-thought prompting can degrade performance on classification tasks with large label spaces and narrow decision boundaries. Furthermore, we study fine-tuning MLMs on domain-specific data, achieving improvements of over 30% compared to base models. These findings highlight both the potential and limitations of current MLMs for speech-language pathology applications, underscoring the need for further research and targeted development.
Abstract（参考訳）: アメリカ国立衛生研究所によると、340万人以上の子供が臨床介入を必要とする発声障害を経験している。言語病理医(SLP)の数は、患児の約20倍であり、子どものケアの著しいギャップと、SLPの生産性を向上させる技術サポートの必要性を強調している。最先端のマルチモーダル言語モデル(MLM)は、SLPをサポートすることを約束するが、その使用法は、ハイテイクな臨床環境での性能の理解が限られているため、未探索のままである。このギャップに対処するため、我々はドメインの専門家と共同で、言語病理学におけるMLMの現実世界のユースケースの分類を開発する。この分類に基づいて、我々は5つの中核ユースケースにまたがってMLMを評価するための最初の総合的なベンチマークを導入し、それぞれが1000の注釈付きデータポイントを含む。このベンチマークには、バックグラウンドノイズ、話者の性別、アクセントなど、さまざまな設定下での堅牢性と感度テストが含まれている。 15の最先端MLMを評価した結果,1つのモデルが全てのタスクにおいて常に他よりも優れていることが判明した。特に,モデルが男性話者に対して良好に機能するなど,系統的な差異がみられ,大きなラベル空間と狭い決定境界を持つ分類タスクにおいて,チェーン・オブ・プルーピングが性能を低下させる可能性が示唆された。さらに、ドメイン固有データに対する微調整MDMについて検討し、ベースモデルと比較して30%以上の改善を実現した。これらの知見は、言語病理学応用における現在のMLMの可能性と限界を浮き彫りにして、さらなる研究とターゲット開発の必要性を浮き彫りにしている。

論文の概要: The Sound of Syntax: Finetuning and Comprehensive Evaluation of Language Models for Speech Pathology

関連論文リスト