Fugu-MT 論文翻訳(概要): Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive Learning for Multimodal Emotion Recognition

論文の概要: Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive Learning for Multimodal Emotion Recognition

arxiv url: http://arxiv.org/abs/2312.16778v1
Date: Thu, 28 Dec 2023 01:57:26 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-29 18:05:24.569315
Title: Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive Learning for Multimodal Emotion Recognition
Title（参考訳）: マルチモーダル感情認識のためのモーダル内およびモーダル間グラフコントラスト学習による敵意表現
Authors: Yuntao Shou, Tao Meng, Wei Ai and Keqin Li
Abstract要約: マルチモーダル感情認識 (AR-IIGCN) 法に対して, モーダル内およびモーダル間グラフコントラストを用いた新しい適応表現を提案する。まず、ビデオ、オーディオ、テキストの特徴を多層パーセプトロン(MLP)に入力し、それらを別々の特徴空間にマッピングする。第2に,逆表現による3つのモーダル特徴に対するジェネレータと判別器を構築する。第3に、モーダル内およびモーダル間相補的意味情報を取得するために、コントラッシブグラフ表現学習を導入する。
参考スコア（独自算出の注目度）: 15.4676247289299
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the release of increasing open-source emotion recognition datasets on social media platforms and the rapid development of computing resources, multimodal emotion recognition tasks (MER) have begun to receive widespread research attention. The MER task extracts and fuses complementary semantic information from different modalities, which can classify the speaker's emotions. However, the existing feature fusion methods have usually mapped the features of different modalities into the same feature space for information fusion, which can not eliminate the heterogeneity between different modalities. Therefore, it is challenging to make the subsequent emotion class boundary learning. To tackle the above problems, we have proposed a novel Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive for Multimodal Emotion Recognition (AR-IIGCN) method. Firstly, we input video, audio, and text features into a multi-layer perceptron (MLP) to map them into separate feature spaces. Secondly, we build a generator and a discriminator for the three modal features through adversarial representation, which can achieve information interaction between modalities and eliminate heterogeneity among modalities. Thirdly, we introduce contrastive graph representation learning to capture intra-modal and inter-modal complementary semantic information and learn intra-class and inter-class boundary information of emotion categories. Specifically, we construct a graph structure for three modal features and perform contrastive representation learning on nodes with different emotions in the same modality and the same emotion in different modalities, which can improve the feature representation ability of nodes. Extensive experimental works show that the ARL-IIGCN method can significantly improve emotion recognition accuracy on IEMOCAP and MELD datasets.
Abstract（参考訳）: ソーシャルメディアプラットフォームにおけるオープンソースの感情認識データセットの増加と、コンピューティングリソースの急速な発展により、マルチモーダル感情認識タスク(mer)が広く研究の注目を集めている。 merタスクは、異なるモダリティから補完的な意味情報を抽出し、融合し、話者の感情を分類する。しかし、既存の特徴融合法は通常、異なるモダリティの特徴を情報融合のための同じ特徴空間にマッピングしており、異なるモダリティ間の不均一性を排除することはできない。したがって、その後の感情クラス境界学習を行うことは困難である。そこで本研究では,マルチモーダル感情認識(AR-IIGCN)法に対して,モーダル内およびモーダル間グラフを用いた適応表現を提案する。まず、ビデオ、オーディオ、テキストの特徴を多層パーセプトロン(MLP)に入力し、それらを別々の特徴空間にマッピングする。第2に,モーダル間の情報相互作用を実現し,モーダル間の不均一性を排除できる3つのモーダル特徴のジェネレータと判別器を構築する。第3に,モーダル内およびモーダル間補完的意味情報を取り込んで感情カテゴリーのクラス内およびクラス間境界情報を学ぶために,コントラストグラフ表現学習を導入する。具体的には,3つのモーダル特徴のグラフ構造を構築し,同じモーダル性において異なる感情と異なるモーダル性で同じ感情を持つノード上での対比表現学習を行い,ノードの特徴表現能力を向上させる。大規模な実験により、ARL-IIGCN法はIEMOCAPおよびMELDデータセット上での感情認識精度を大幅に向上できることが示された。

関連論文リスト

GIA-MIC: Multimodal Emotion Recognition with Gated Interactive Attention and Modality-Invariant Learning Constraints [24.242098942377574]
マルチモーダル感情認識(MER)は、視覚、音声、テキスト入力を含むマルチモーダルデータから感情を抽出し、人間とコンピュータの相互作用において重要な役割を果たす。本稿では,相互の相互作用を通じて感情情報を高めつつ,モダリティ特有の特徴を適応的に抽出する対話型アテンション機構を提案する。 IEMOCAPの実験では、我々の手法は最先端のMERアプローチより優れており、WA 80.7%、UA 81.3%を達成している。
論文参考訳（メタデータ） (2025-06-01T07:07:02Z)
Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content [56.62027582702816]
マルチモーダル・センティメント・アナリティクスは、テキスト、音声、視覚データを融合することで人間の感情を解き放つことを目指している。しかし、音声やビデオの表現の中で微妙な感情的なニュアンスを認識することは、恐ろしい挑戦だ。テキストの感情記述に基づくプログレッシブ・フュージョン・フレームワークであるDEVAを紹介する。
論文参考訳（メタデータ） (2024-12-12T11:30:41Z)
WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition [2.3367170233149324]
We propose WavFusion, a multimodal speech emotion recognition framework。 WavFusionは、効果的なマルチモーダル融合、モダリティ、差別的表現学習における重要な研究課題に対処する。本研究は, 精度の高いマルチモーダルSERにおいて, ニュアンスな相互モーダル相互作用を捉え, 識別表現を学習することの重要性を強調した。
論文参考訳（メタデータ） (2024-12-07T06:43:39Z)
Masked Graph Learning with Recurrent Alignment for Multimodal Emotion Recognition in Conversation [12.455034591553506]
対話におけるマルチモーダル感情認識(MERC)は、世論監視、インテリジェントな対話ロボット、その他の分野に適用することができる。従来の作業では、マルチモーダル融合前のモーダル間アライメントプロセスとモーダル内ノイズ情報を無視していた。我々は,MGLRA(Masked Graph Learning with Recursive Alignment)と呼ばれる新しい手法を開発し,この問題に対処した。
論文参考訳（メタデータ） (2024-07-23T02:23:51Z)
Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning [40.101313334772016]
会話における感情認識の目的は、文脈情報に基づいて発話の感情カテゴリーを特定することである。従来のERC法は、クロスモーダル核融合のための単純な接続に依存していた。本稿では,ベクトル接続に基づくモーダル融合感情予測ネットワークを提案する。
論文参考訳（メタデータ） (2024-05-28T07:22:30Z)
AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations [57.99479708224221]
AIMDiTと呼ばれる新しいフレームワークを提案し、深い特徴のマルチモーダル融合の問題を解決する。公開ベンチマークデータセットMELDでAIMDiTフレームワークを使用して行った実験では、Acc-7とw-F1メトリクスの2.34%と2.87%の改善が明らかにされた。
論文参考訳（メタデータ） (2024-04-12T11:31:18Z)
Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimodal Emotion Recognition [18.571931295274975]
マルチモーダル感情認識は、複数のモーダルの発話毎に感情を認識することを目的としている。現在のグラフベースの手法では、対話においてグローバルな文脈特徴と局所的な多様なユニモーダル特徴を同時に表現することができない。マルチモーダル感情認識のための共同モーダル融合法とグラフコントラスト学習法(Joyful)を提案する。
論文参考訳（メタデータ） (2023-11-18T08:21:42Z)
Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
マルチモーダルエンティティリンクタスクは、マルチモーダル知識グラフへの曖昧な言及を解決することを目的としている。 MELタスクを解決するための新しいMulti-Grained Multimodal InteraCtion Network $textbf(MIMIC)$ frameworkを提案する。
論文参考訳（メタデータ） (2023-07-19T02:11:19Z)
EMERSK -- Explainable Multimodal Emotion Recognition with Situational Knowledge [0.0]
状況知識を用いた説明可能なマルチモーダル感情認識(EMERSK)を提案する。 EMERSKは視覚情報を用いた人間の感情認識と説明のための汎用システムである。本システムは, 表情, 姿勢, 歩行などの複数のモーダルを柔軟かつモジュラーな方法で処理することができる。
論文参考訳（メタデータ） (2023-06-14T17:52:37Z)
High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning [112.51498431119616]
本稿では,多種多様なモダリティを含む高モダリティシナリオに対する効率的な表現学習について検討する。単一のモデルであるHighMMTは、テキスト、画像、オーディオ、ビデオ、センサー、プロプレセプション、スピーチ、時系列、セット、テーブル)と5つの研究領域から15のタスクをスケールする。
論文参考訳（メタデータ） (2022-03-02T18:56:20Z)
Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models [53.31917090073727]
本稿では,音声とテキストのモダリティから,伝達学習モデルと微調整モデルとを融合したニューラルネットワークによる感情認識フレームワークを提案する。本稿では,対話型感情的モーションキャプチャー・データセットにおけるマルチモーダル・アプローチの有効性を評価する。
論文参考訳（メタデータ） (2022-02-16T00:23:42Z)
Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition [63.07844685982738]
本稿では、LSTM隠蔽状態上の注目に基づく双方向アライメントネットワークで構成されるGBAN(Gated Bidirectional Alignment Network)と呼ばれる新しいモデルを提案する。 LSTMの最後の隠れ状態よりもアテンション整列表現の方が有意に優れていたことを実証的に示す。提案したGBANモデルは、IEMOCAPデータセットにおける既存の最先端マルチモーダルアプローチよりも優れている。
論文参考訳（メタデータ） (2022-01-17T09:46:59Z)
Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition [7.147235324895931]
本稿では,より情報に富んだマルチモーダル表現を学習する階層型グラフネットワーク(HFGCN)モデルを提案する。具体的には,2段階グラフ構築手法を用いてマルチモーダル入力を融合し,モダリティ依存性を会話表現にエンコードする。実験により,より正確なAERモデルの有効性が示された。
論文参考訳（メタデータ） (2021-09-15T08:21:01Z)
Emotion Recognition from Multiple Modalities: Fundamentals and Methodologies [106.62835060095532]
マルチモーダル感情認識(MER)のいくつかの重要な側面について論じる。まず、広く使われている感情表現モデルと感情モダリティの簡単な紹介から始める。次に、既存の感情アノテーション戦略とそれに対応する計算タスクを要約する。最後に,実世界のアプリケーションについて概説し,今後の方向性について論じる。
論文参考訳（メタデータ） (2021-08-18T21:55:20Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。