Fugu-MT 論文翻訳(概要): DA-Mamba: Dialogue-aware selective state-space model for multimodal engagement estimation

論文の概要: DA-Mamba: Dialogue-aware selective state-space model for multimodal engagement estimation

arxiv url: http://arxiv.org/abs/2509.17711v1
Date: Mon, 22 Sep 2025 12:48:42 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-23 18:58:16.386525
Title: DA-Mamba: Dialogue-aware selective state-space model for multimodal engagement estimation
Title（参考訳）: DA-Mamba:多モーダルエンゲージメント推定のための対話対応選択状態空間モデル
Authors: Shenwei Kang, Xin Zhang, Wen Liu, Bin Li, Yujie Liu, Bo Gao,
Abstract要約: DA-Mambaは、注目度の高い対話エンコーダをMambaベースの選択状態空間処理に置き換える対話対応マルチモーダルアーキテクチャである。 DA-Mambaは, コンコーダンス相関係数(CCC)において, 先行技術(SOTA)法を超越していることを示す。これにより、より長いシーケンスの処理が可能になり、リソース制約のある複数パーティの会話設定でのリアルタイムデプロイメントが容易になる。
参考スコア（独自算出の注目度）: 15.106664911098882
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human engagement estimation in conversational scenarios is essential for applications such as adaptive tutoring, remote healthcare assessment, and socially aware human--computer interaction. Engagement is a dynamic, multimodal signal conveyed by facial expressions, speech, gestures, and behavioral cues over time. In this work we introduce DA-Mamba, a dialogue-aware multimodal architecture that replaces attention-heavy dialogue encoders with Mamba-based selective state-space processing to achieve linear time and memory complexity while retaining expressive cross-modal reasoning. We design a Mamba dialogue-aware selective state-space model composed of three core modules: a Dialogue-Aware Encoder, and two Mamba-based fusion mechanisms: Modality-Group Fusion and Partner-Group Fusion, these modules achieve expressive dialogue understanding. Extensive experiments on three standard benchmarks (NoXi, NoXi-Add, and MPIIGI) show that DA-Mamba surpasses prior state-of-the-art (SOTA) methods in concordance correlation coefficient (CCC), while reducing training time and peak memory; these gains enable processing much longer sequences and facilitate real-time deployment in resource-constrained, multi-party conversational settings. The source code will be available at: https://github.com/kksssssss-ssda/MMEA.
Abstract（参考訳）: 会話シナリオにおける人間のエンゲージメント推定は、適応型チュータリング、リモートヘルスケアアセスメント、社会的に認識された人間とコンピュータの相互作用といった応用に不可欠である。エンゲージメント(Engagement)は、表情、スピーチ、ジェスチャー、行動的手がかりによって時間とともに伝達される動的でマルチモーダルな信号である。本研究では, DA-Mambaについて述べる。DA-Mambaは, 注目度の高い対話エンコーダをMambaベースの選択的状態空間処理に置き換え, 表現力のあるクロスモーダル推論を維持しつつ, 線形時間とメモリの複雑さを実現する対話型マルチモーダルアーキテクチャである。我々は,3つのコアモジュール(対話認識エンコーダ)と,2つのMambaベースの融合機構であるModality-Group FusionとPartner-Group Fusion)からなる,Mambaの対話対応選択状態空間モデルの設計を行った。 3つの標準ベンチマーク(NoXi、NoXi-Add、MPIIGI)の広範な実験により、DA-Mambaは、トレーニング時間とピークメモリを削減しつつ、コンコータンス相関係数(CCC)における従来の最先端(SOTA)メソッドを超越している。ソースコードは、https://github.com/kksssssss-ssda/MMEA.comで入手できる。

論文の概要: DA-Mamba: Dialogue-aware selective state-space model for multimodal engagement estimation

関連論文リスト