Fugu-MT 論文翻訳(概要): TACOformer:Token-channel compounded Cross Attention for Multimodal Emotion Recognition

論文の概要: TACOformer:Token-channel compounded Cross Attention for Multimodal Emotion Recognition

arxiv url: http://arxiv.org/abs/2306.13592v2
Date: Mon, 21 Aug 2023 16:37:46 GMT
ステータス: 翻訳完了
システム内更新日: 2023-08-22 23:27:19.406816
Title: TACOformer:Token-channel compounded Cross Attention for Multimodal Emotion Recognition
Title（参考訳）: TACOformer:マルチモーダル感情認識のためのTokenチャネル合成クロスアテンション
Authors: Xinda Li
Abstract要約: 本稿では,チャネルレベルとトークンレベルの相互通信を統合したマルチモーダル融合の包括的視点を提案する。具体的には,Token-chAnnel Compound (TACO) Cross Attentionというクロスアテンションモジュールを導入する。また,脳波信号チャネルの空間分布に関する情報を保存するための2次元位置符号化手法を提案する。
参考スコア（独自算出の注目度）: 0.951828574518325
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, emotion recognition based on physiological signals has emerged as a field with intensive research. The utilization of multi-modal, multi-channel physiological signals has significantly improved the performance of emotion recognition systems, due to their complementarity. However, effectively integrating emotion-related semantic information from different modalities and capturing inter-modal dependencies remains a challenging issue. Many existing multimodal fusion methods ignore either token-to-token or channel-to-channel correlations of multichannel signals from different modalities, which limits the classification capability of the models to some extent. In this paper, we propose a comprehensive perspective of multimodal fusion that integrates channel-level and token-level cross-modal interactions. Specifically, we introduce a unified cross attention module called Token-chAnnel COmpound (TACO) Cross Attention to perform multimodal fusion, which simultaneously models channel-level and token-level dependencies between modalities. Additionally, we propose a 2D position encoding method to preserve information about the spatial distribution of EEG signal channels, then we use two transformer encoders ahead of the fusion module to capture long-term temporal dependencies from the EEG signal and the peripheral physiological signal, respectively. Subject-independent experiments on emotional dataset DEAP and Dreamer demonstrate that the proposed model achieves state-of-the-art performance.
Abstract（参考訳）: 近年,生理的信号に基づく感情認識が,集中研究の分野として浮上している。マルチモーダル・マルチチャネル生理信号の利用は,その相補性から感情認識システムの性能を著しく向上させた。しかし、感情に関連したセマンティクス情報を異なるモダリティから効果的に統合し、モダリティ間の依存関係を捉えることは難しい課題である。多くの既存のマルチモーダル融合法は、異なるモダリティのマルチチャネル信号のトークン対トケンまたはチャネル対チャネル相関を無視しており、モデルの分類能力はある程度制限されている。本稿では,チャネルレベルとトークンレベルの相互通信を統合したマルチモーダル融合の包括的視点を提案する。具体的には,マルチモーダル融合を実現するために,token-channel compound (taco) と呼ばれる統一クロスアテンションモジュールを導入して,チャネルレベルとトークンレベルのモダリティ間の依存関係を同時にモデル化する。さらに,脳波信号チャネルの空間分布に関する情報を保存する2次元位置符号化法を提案し,融合モジュールに先立つ2つのトランスエンコーダを用いて,脳波信号と周辺生理信号からの長期的時間依存性をそれぞれ捉える。感情データセットDEAPとDreamerの被験者非依存実験は、提案モデルが最先端のパフォーマンスを達成することを示す。

関連論文リスト

VR Based Emotion Recognition Using Deep Multimodal Fusion With Biosignals Across Multiple Anatomical Domains [3.303674512749726]
本稿では,Squeeze-and-Excitation(SE)ブロックと組み合わせた,新しいマルチスケールアテンションベースのLSTMアーキテクチャを提案する。提案したアーキテクチャは,ユーザスタディで検証され,評価値と覚醒レベルを分類する上で,優れた性能を示す。
論文参考訳（メタデータ） (2024-12-03T08:59:12Z)
Masked Graph Learning with Recurrent Alignment for Multimodal Emotion Recognition in Conversation [12.455034591553506]
対話におけるマルチモーダル感情認識(MERC)は、世論監視、インテリジェントな対話ロボット、その他の分野に適用することができる。従来の作業では、マルチモーダル融合前のモーダル間アライメントプロセスとモーダル内ノイズ情報を無視していた。我々は,MGLRA(Masked Graph Learning with Recursive Alignment)と呼ばれる新しい手法を開発し,この問題に対処した。
論文参考訳（メタデータ） (2024-07-23T02:23:51Z)
Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition [18.65975882665568]
機能近赤外分光法(NIRS)や脳波法(EEG)などの生理的信号に基づく抑うつは大きな進歩を遂げている。本稿では,抑うつ認識のためのマルチスケールコントラストを用いたアーキテクチャを用いたマルチモーダル生理学的信号表現学習フレームワークを提案する。刺激タスクに関連する意味表現の学習を強化するために,意味コントラストモジュールを提案する。
論文参考訳（メタデータ） (2024-06-22T09:28:02Z)
Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
本論文は、任意のモーダリティ・サリエント物体検出(AM SOD)の課題について述べる。任意のモダリティ、例えばRGBイメージ、RGB-Dイメージ、RGB-D-Tイメージから有能なオブジェクトを検出することを目的としている。 AM SODの2つの基本的な課題を解明するために,新しいモード適応トランス (MAT) を提案する。
論文参考訳（メタデータ） (2024-05-06T11:02:02Z)
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
実世界のアプリケーションでは,音声およびビデオデータに基づくマルチモーダル感情認識が重要である。近年の手法は、強力なマルチモーダルエンコーダの事前学習に自己教師付き学習(SSL)の進歩を活用することに重点を置いている。 SSL-pre-trained disimodal encoders を用いて,この問題に対する異なる視点とマルチモーダル DFER の性能向上について検討する。
論文参考訳（メタデータ） (2024-04-13T13:39:26Z)
Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
マルチモーダル感情認識(MMER)システムは、通常、単調なシステムよりも優れている。本稿では,キーベースのクロスアテンションと融合するために,ジョイントマルチモーダルトランス (JMT) を利用するMMER法を提案する。
論文参考訳（メタデータ） (2024-03-15T17:23:38Z)
Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
マルチモーダル特徴の融合と復号を導くために,クロスモーダル・セマンティックスをマイニングする手法を提案する。具体的には,(1)全周減衰核融合(AF),(2)粗大デコーダ(CFD),(3)多層自己超越からなる新しいネットワークXMSNetを提案する。
論文参考訳（メタデータ） (2023-05-17T14:30:11Z)
Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition [2.4364387374267427]
ウェアラブル感情認識のための新しい自己教師型学習(SSL)フレームワークを提案する。本手法は様々な感情分類タスクにおいて最先端の結果を得た。
論文参考訳（メタデータ） (2023-03-29T19:45:55Z)
Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition [63.07844685982738]
本稿では、LSTM隠蔽状態上の注目に基づく双方向アライメントネットワークで構成されるGBAN(Gated Bidirectional Alignment Network)と呼ばれる新しいモデルを提案する。 LSTMの最後の隠れ状態よりもアテンション整列表現の方が有意に優れていたことを実証的に示す。提案したGBANモデルは、IEMOCAPデータセットにおける既存の最先端マルチモーダルアプローチよりも優れている。
論文参考訳（メタデータ） (2022-01-17T09:46:59Z)
Deep Multimodal Fusion by Channel Exchanging [87.40768169300898]
本稿では,異なるモードのサブネットワーク間で動的にチャネルを交換するパラメータフリーマルチモーダル融合フレームワークを提案する。このような交換プロセスの有効性は、畳み込みフィルタを共有してもBN層をモダリティで分離しておくことで保証される。
論文参考訳（メタデータ） (2020-11-10T09:53:20Z)
Low Rank Fusion based Transformers for Multimodal Sequences [9.507869508188266]
CMU-MOSEI, CMU-MOSI, IEMOCAPデータセットを用いたマルチモーダル知覚と感情認識の2つの手法を提案する。我々のモデルはより少ないパラメータを持ち、より速く訓練し、多くの大規模な核融合ベースのアーキテクチャと相容れない性能を発揮する。
論文参考訳（メタデータ） (2020-07-04T08:05:40Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。