Fugu-MT 論文翻訳(概要): Dynamic Fusion Multimodal Network for SpeechWellness Detection

論文の概要: Dynamic Fusion Multimodal Network for SpeechWellness Detection

arxiv url: http://arxiv.org/abs/2508.18057v2
Date: Mon, 01 Sep 2025 11:20:37 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-03 12:29:36.78144
Title: Dynamic Fusion Multimodal Network for SpeechWellness Detection
Title（参考訳）: 音声重み検出のための動的融合型マルチモーダルネットワーク
Authors: Wenqiang Sun, Han Yin, Jisheng Bai, Jianfeng Chen,
Abstract要約: 自殺は青年期の死因の1つである。これまでの自殺リスク予測研究は、主に単独でテキスト情報と音響情報の両方に焦点を当ててきた。音声検出のための動的融合機構に基づく軽量マルチブランチマルチモーダルシステムについて検討する。
参考スコア（独自算出の注目度）: 7.169178956727836
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Suicide is one of the leading causes of death among adolescents. Previous suicide risk prediction studies have primarily focused on either textual or acoustic information in isolation, the integration of multimodal signals, such as speech and text, offers a more comprehensive understanding of an individual's mental state. Motivated by this, and in the context of the 1st SpeechWellness detection challenge, we explore a lightweight multi-branch multimodal system based on a dynamic fusion mechanism for speechwellness detection. To address the limitation of prior approaches that rely on time-domain waveforms for acoustic analysis, our system incorporates both time-domain and time-frequency (TF) domain acoustic features, as well as semantic representations. In addition, we introduce a dynamic fusion block to adaptively integrate information from different modalities. Specifically, it applies learnable weights to each modality during the fusion process, enabling the model to adjust the contribution of each modality. To enhance computational efficiency, we design a lightweight structure by simplifying the original baseline model. Experimental results demonstrate that the proposed system exhibits superior performance compared to the challenge baseline, achieving a 78% reduction in model parameters and a 5% improvement in accuracy.
Abstract（参考訳）: 自殺は青年期の死因の1つである。以前の自殺リスク予測研究は、主に単独でテキスト情報または音響情報に焦点を合わせており、音声やテキストなどのマルチモーダル信号の統合は、個人の精神状態をより包括的に理解する。そこで本研究では,第1回音声ウェルネス検出チャレンジにおいて,音声ウェルネス検出のための動的融合機構に基づく,軽量なマルチブランチマルチモーダルシステムについて検討する。音響解析における時間領域波形に依存した先行手法の制限に対処するため,本システムは時間領域と時間周波数領域の音響特徴と意味表現を取り入れた。さらに,異なるモードからの情報を適応的に統合する動的融合ブロックを導入する。具体的には、融合過程中の各モードに学習可能な重みを適用し、モデルが各モードの寄与を調整することができる。計算効率を向上させるため,元のベースラインモデルを簡単にすることで軽量な構造を設計する。実験結果から,提案システムは課題ベースラインよりも優れた性能を示し,モデルパラメータの78%削減,精度の5%向上を実現した。

論文の概要: Dynamic Fusion Multimodal Network for SpeechWellness Detection

関連論文リスト