Fugu-MT 論文翻訳(概要): HI-TransPA: Hearing Impairments Translation Personal Assistant

論文の概要: HI-TransPA: Hearing Impairments Translation Personal Assistant

arxiv url: http://arxiv.org/abs/2511.09915v2
Date: Fri, 14 Nov 2025 18:05:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-17 14:38:02.178723
Title: HI-TransPA: Hearing Impairments Translation Personal Assistant
Title（参考訳）: HI-TransPA:聴覚障害者翻訳アシスタント
Authors: Zhiming Ma, Shiyu Gan, Junhao Zhao, Xianming Li, Qingyun Pan, Peidong Wang, Mingjun Pan, Yuhao Mo, Jiajie Cheng, Chengxin Chen, Zhonglun Cao, Chonghan Liu, Shi Cheng,
Abstract要約: 我々は,Omni-Modelパラダイムを補助技術に導入し,インストラクション駆動型音声視覚パーソナルアシスタントHI-TransPAを提案する。このモデルは、不明瞭な音声を唇のダイナミックスと融合させ、単一のマルチモーダル・フレームワーク内での翻訳と対話を可能にする。 HI-Dialogueデータセットを用いた実験により、HI-TransPAは精度とセマンティック忠実度の両方で最先端の性能を達成することが示された。
参考スコア（独自算出の注目度）: 23.33416647487016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hearing-impaired individuals often face significant barriers in daily communication due to the inherent challenges of producing clear speech. To address this, we introduce the Omni-Model paradigm into assistive technology and present HI-TransPA, an instruction-driven audio-visual personal assistant. The model fuses indistinct speech with lip dynamics, enabling both translation and dialogue within a single multimodal framework. To address the distinctive pronunciation patterns of hearing-impaired speech and the limited adaptability of existing models, we develop a multimodal preprocessing and curation pipeline that detects facial landmarks, stabilizes the lip region, and quantitatively evaluates sample quality. These quality scores guide a curriculum learning strategy that first trains on clean, high-confidence samples and progressively incorporates harder cases to strengthen model robustness. Architecturally, we employs a novel unified 3D-Resampler to efficiently encode the lip dynamics, which is critical for accurate interpretation. Experiments on purpose-built HI-Dialogue dataset show that HI-TransPA achieves state-of-the-art performance in both literal accuracy and semantic fidelity. Our work establishes a foundation for applying Omni-Models to assistive communication technology, providing an end-to-end modeling framework and essential processing tools for future research.
Abstract（参考訳）: 聴覚障害者は、明瞭な発話を生み出すという固有の課題のために、日々のコミュニケーションにおいて重大な障壁に直面していることが多い。そこで我々は,Omni-Modelパラダイムを補助技術に導入し,インストラクション駆動型音声視覚パーソナルアシスタントHI-TransPAを提案する。このモデルは、不明瞭な音声を唇のダイナミックスと融合させ、単一のマルチモーダル・フレームワーク内での翻訳と対話を可能にする。聴覚障害者の独特の発音パターンと既存モデルの適応性を制限するために,顔のランドマークを検出し,唇領域を安定化し,サンプル品質を定量的に評価するマルチモーダル前処理・キュレーションパイプラインを開発した。これらの品質スコアは、まずクリーンで高信頼のサンプルを訓練し、モデルの堅牢性を強化するために難しいケースを徐々に取り入れるカリキュラム学習戦略を導いてくれる。アーキテクチャ上,我々はリップダイナミックスを効率的にエンコードするために,新しい3D-Resamplerを採用している。 HI-Dialogueデータセットを用いた実験により、HI-TransPAは精度とセマンティック忠実度の両方で最先端の性能を達成することが示された。我々は,Omni-Modelsをコミュニケーション支援技術に適用するための基盤を確立し,エンド・ツー・エンド・エンド・モデリング・フレームワークと,今後の研究に不可欠な処理ツールを提供する。

論文の概要: HI-TransPA: Hearing Impairments Translation Personal Assistant

関連論文リスト