Fugu-MT 論文翻訳(概要): A cross-modal fusion network based on self-attention and residual structure for multimodal emotion recognition

論文の概要: A cross-modal fusion network based on self-attention and residual structure for multimodal emotion recognition

arxiv url: http://arxiv.org/abs/2111.02172v1
Date: Wed, 3 Nov 2021 12:24:03 GMT
ステータス: 翻訳完了
システム内更新日: 2021-11-04 13:16:40.034552
Title: A cross-modal fusion network based on self-attention and residual structure for multimodal emotion recognition
Title（参考訳）: マルチモーダル感情認識のための自己アテンションと残差構造に基づくクロスモーダル融合ネットワーク
Authors: Ziwang Fu, Feng Liu, Hanyang Wang, Jiayin Qi, Xiangling Fu, Aimin Zhou, Zhibin Li
Abstract要約: マルチモーダル感情認識のための自己注意構造と残像構造(CFN-SR)に基づく新たなクロスモーダル融合ネットワークを提案する。提案手法の有効性を検証するため,RAVDESSデータセットを用いて実験を行った。実験結果から,提案したCFN-SRは最先端技術を実現し,精度が75.76%,パラメータが26.30Mであることが確認された。
参考スコア（独自算出の注目度）: 7.80238628278552
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The audio-video based multimodal emotion recognition has attracted a lot of attention due to its robust performance. Most of the existing methods focus on proposing different cross-modal fusion strategies. However, these strategies introduce redundancy in the features of different modalities without fully considering the complementary properties between modal information, and these approaches do not guarantee the non-loss of original semantic information during intra- and inter-modal interactions. In this paper, we propose a novel cross-modal fusion network based on self-attention and residual structure (CFN-SR) for multimodal emotion recognition. Firstly, we perform representation learning for audio and video modalities to obtain the semantic features of the two modalities by efficient ResNeXt and 1D CNN, respectively. Secondly, we feed the features of the two modalities into the cross-modal blocks separately to ensure efficient complementarity and completeness of information through the self-attention mechanism and residual structure. Finally, we obtain the output of emotions by splicing the obtained fused representation with the original representation. To verify the effectiveness of the proposed method, we conduct experiments on the RAVDESS dataset. The experimental results show that the proposed CFN-SR achieves the state-of-the-art and obtains 75.76% accuracy with 26.30M parameters. Our code is available at https://github.com/skeletonNN/CFN-SR.
Abstract（参考訳）: オーディオビデオに基づくマルチモーダル感情認識は、堅牢なパフォーマンスのために多くの注目を集めている。既存の手法のほとんどは、異なるモーダル融合戦略の提案に焦点を当てている。しかし、これらの戦略は、モーダル情報間の相補的性質を完全に考慮することなく、異なるモーダルの特徴の冗長性を導入し、モーダル内およびモーダル間相互作用における元の意味情報の欠如を保証しない。本稿では,マルチモーダル感情認識のための自己注意・残差構造(CFN-SR)に基づく新たな相互統合ネットワークを提案する。まず,音声と映像のモダリティに対する表現学習を行い,その2つのモダリティの意味的特徴を,それぞれ効率的な再帰と1次元cnnによって獲得する。第2に,2つのモダリティの特徴を分離してクロスモーダルブロックに供給し,自己着脱機構と残留構造を通じて情報の効率的な相補性と完全性を確保する。最後に、得られた融合表現を元の表現とスプライシングすることで感情の出力を得る。提案手法の有効性を検証するため,RAVDESSデータセットを用いて実験を行った。実験の結果、cfn-srは最新技術を達成し、26.30mのパラメータで75.76%の精度を得た。私たちのコードはhttps://github.com/skeletonnn/cfn-srで入手できる。

関連論文リスト

TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition [5.9931594640934325]
クロスモーダルアテンションに基づく核融合法は高い性能と強靭性を示す。本稿では, 変圧器を用いた適応型クロスモーダル核融合ネットワーク(TACFN)を提案する。実験結果から,TACFNは他の手法と比較して大きな性能向上が見られた。
論文参考訳（メタデータ） (2025-05-10T06:57:58Z)
Multi-modal Speech Emotion Recognition via Feature Distribution Adaptation Network [12.200776612016698]
本稿では,特徴分布適応ネットワーク(Feature Distribution Adapted Network)と呼ばれる新しい深層帰納学習フレームワークを提案する。本手法は,感情の一貫した表現を得るために,深層移動学習戦略を用いて視覚的特徴分布と音声的特徴分布を整列させることを目的とする。
論文参考訳（メタデータ） (2024-10-29T13:13:30Z)
Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations [19.731611716111566]
本稿では,モダリティ学習のためのマルチモーダル融合手法を提案する。我々は、モーダル内の信頼性のあるコンテキストダイナミクスをキャプチャする予測的自己アテンションモジュールを導入する。階層的クロスモーダルアテンションモジュールは、モダリティ間の価値ある要素相関を探索するために設計されている。両識別器戦略が提示され、異なる表現を敵対的に生成することを保証する。
論文参考訳（メタデータ） (2024-07-06T04:36:48Z)
Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
マルチモーダル感情認識(MMER)システムは、通常、単調なシステムよりも優れている。本稿では,キーベースのクロスアテンションと融合するために,ジョイントマルチモーダルトランス (JMT) を利用するMMER法を提案する。
論文参考訳（メタデータ） (2024-03-15T17:23:38Z)
Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
マルチモーダル特徴の融合と復号を導くために,クロスモーダル・セマンティックスをマイニングする手法を提案する。具体的には,(1)全周減衰核融合(AF),(2)粗大デコーダ(CFD),(3)多層自己超越からなる新しいネットワークXMSNetを提案する。
論文参考訳（メタデータ） (2023-05-17T14:30:11Z)
Exploiting modality-invariant feature for robust multimodal emotion recognition with missing modalities [76.08541852988536]
我々は、欠落したモダリティ・イマジネーション・ネットワーク(IF-MMIN)に不変な特徴を用いることを提案する。提案モデルは,不確実なモダリティ条件下で,すべてのベースラインを上回り,全体の感情認識性能を不変に向上することを示す。
論文参考訳（メタデータ） (2022-10-27T12:16:25Z)
LMR-CBT: Learning Modality-fused Representations with CB-Transformer for Multimodal Emotion Recognition from Unaligned Multimodal Sequences [5.570499497432848]
マルチモーダル感情認識のためのCB-Transformer (LMR-CBT) を用いて, モダリティ融合表現を学習するための効率的なニューラルネットワークを提案する。 3つの挑戦的なデータセット上で、単語整列と非整列の実験を行います。
論文参考訳（メタデータ） (2021-12-03T03:43:18Z)
Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition [13.994609732846344]
最も効果的な感情認識技術は、多種多様な情報ソースを効果的に活用する。本稿では,音声視覚(A-V)モダリティ間で有意な特徴を抽出するための相互注意型融合手法を提案する。その結果、我々のA-V融合モデルは、最先端の融合アプローチよりも優れたコスト効率のアプローチであることが示唆された。
論文参考訳（メタデータ） (2021-11-09T16:01:56Z)
Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) は、2対のモダリティ表現で融合を行う新しいエンドツーエンドネットワークである。モデルは、モダリティ間の既知の情報不均衡により、2つのバイモーダルペアを入力として取る。
論文参考訳（メタデータ） (2021-07-28T23:33:42Z)
Deep Multimodal Fusion by Channel Exchanging [87.40768169300898]
本稿では,異なるモードのサブネットワーク間で動的にチャネルを交換するパラメータフリーマルチモーダル融合フレームワークを提案する。このような交換プロセスの有効性は、畳み込みフィルタを共有してもBN層をモダリティで分離しておくことで保証される。
論文参考訳（メタデータ） (2020-11-10T09:53:20Z)
Domain Private and Agnostic Feature for Modality Adaptive Face Recognition [10.497190559654245]
本稿では,不整合表現モジュール(DRM),特徴融合モジュール(FFM),計量ペナルティ学習セッションを含む特徴集約ネットワーク(FAN)を提案する。第一に、DRMでは、ドメインに依存しないネットワークとドメインに依存しないネットワークという2つのワークは、モダリティの特徴とアイデンティティの特徴を学習するために特別に設計されている。第2に、FFMでは、ID特徴をドメイン特徴と融合させて、双方向の双方向ID特徴変換を実現する。第3に、容易なペアとハードペアの分布不均衡がクロスモーダルデータセットに存在することを考えると、適応性のあるID保存計量学習が可能である。
論文参考訳（メタデータ） (2020-08-10T00:59:42Z)
Cross-modality Person re-identification with Shared-Specific Feature Transfer [112.60513494602337]
クロスモダリティの人物再識別(cm-ReID)は、インテリジェントビデオ分析において難しいが重要な技術である。モーダリティ共有型特徴伝達アルゴリズム (cm-SSFT) を提案し, モーダリティ共有型情報とモーダリティ固有特性の両方のポテンシャルについて検討する。
論文参考訳（メタデータ） (2020-02-28T00:18:45Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。