Fugu-MT 論文翻訳(概要): Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition

論文の概要: Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition

arxiv url: http://arxiv.org/abs/2308.04502v2
Date: Sat, 12 Aug 2023 06:05:26 GMT
ステータス: 翻訳完了
システム内更新日: 2023-08-15 18:21:04.795083
Title: Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition
Title（参考訳）: 会話型マルチモーダル感情認識におけるモーダリティとコンテキストに関する再検討
Authors: Bobo Li, Hao Fei, Lizi Liao, Yu Zhao, Chong Teng, Tat-Seng Chua, Donghong Ji, Fei Li
Abstract要約: 特徴の多様性と会話の文脈化は、特徴の絡み合いと融合の段階において、同時に適切にモデル化されるべきである。マルチモーダル・コンテキスト統合のためのコントリビューション・アウェア・フュージョン・メカニズム(CFM)とコンテキスト・リフュージョン・メカニズム(CRM)を提案する。我々のシステムは、新しい最先端のパフォーマンスを一貫して達成する。
参考スコア（独自算出の注目度）: 81.2011058113579
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: It has been a hot research topic to enable machines to understand human emotions in multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion analysis in conversation (MM-ERC). MM-ERC has received consistent attention in recent years, where a diverse range of methods has been proposed for securing better task performance. Most existing works treat MM-ERC as a standard multimodal classification problem and perform multimodal feature disentanglement and fusion for maximizing feature utility. Yet after revisiting the characteristic of MM-ERC, we argue that both the feature multimodality and conversational contextualization should be properly modeled simultaneously during the feature disentanglement and fusion steps. In this work, we target further pushing the task performance by taking full consideration of the above insights. On the one hand, during feature disentanglement, based on the contrastive learning technique, we devise a Dual-level Disentanglement Mechanism (DDM) to decouple the features into both the modality space and utterance space. On the other hand, during the feature fusion stage, we propose a Contribution-aware Fusion Mechanism (CFM) and a Context Refusion Mechanism (CRM) for multimodal and context integration, respectively. They together schedule the proper integrations of multimodal and context features. Specifically, CFM explicitly manages the multimodal feature contributions dynamically, while CRM flexibly coordinates the introduction of dialogue contexts. On two public MM-ERC datasets, our system achieves new state-of-the-art performance consistently. Further analyses demonstrate that all our proposed mechanisms greatly facilitate the MM-ERC task by making full use of the multimodal and context features adaptively. Note that our proposed methods have the great potential to facilitate a broader range of other conversational multimodal tasks.
Abstract（参考訳）: 会話におけるマルチモーダル感情分析(MM-ERC)の課題である対話シナリオ下で、機械が人間の感情を多モーダルな文脈で理解できるようにするためのホットな研究テーマである。 MM-ERCは近年,タスク性能向上のための多種多様な手法が提案されている。 MM-ERCを標準マルチモーダル分類問題として扱い,特徴量最大化のためのマルチモーダル特徴分散と融合を行う。しかし,MM-ERCの特徴を再考した結果,特徴の多相性と会話の文脈化は,特徴の絡み合いや融合の段階において同時にモデル化されるべきである,と論じている。本研究では、上記の知見を十分に考慮し、タスクパフォーマンスのさらなる向上を目標としている。一方,特徴の絡み合いにおいては,コントラスト学習手法に基づき,特徴をモダリティ空間と発話空間の両方に分離するddm(d-level disentanglement mechanism)を考案する。一方,機能融合の段階では,マルチモーダルとコンテキスト統合のための貢献・認識融合機構(cfm)とコンテキスト再融合機構(crm)を提案する。それらは、マルチモーダル機能とコンテキスト機能の適切な統合をスケジュールする。具体的には、CFMは動的にマルチモーダル機能のコントリビューションを管理し、CRMは対話コンテキストの導入を柔軟に調整する。 2つの公開MM-ERCデータセット上で,本システムは新しい最先端性能を一貫して達成する。さらに,マルチモーダルとコンテキスト機能を適応的に活用することにより,提案手法はすべてmm-ercタスクを大いに促進することを示す。提案手法は,より広い範囲の対話型マルチモーダルタスクを実現するための大きな可能性を秘めている。

関連論文リスト

BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation [55.486872677160015]
マスクレベルの分類タスクとしてマルチモーダルなセマンティックセグメンテーションを再構成する。統一モダリティマッチング(UMM)とクロスモダリティアライメント(CMA)を統合したBiXFormerを提案する。合成および実世界のマルチモーダルベンチマーク実験により,本手法の有効性を実証した。
論文参考訳（メタデータ） (2025-06-04T08:04:58Z)
Masked Graph Learning with Recurrent Alignment for Multimodal Emotion Recognition in Conversation [12.455034591553506]
対話におけるマルチモーダル感情認識(MERC)は、世論監視、インテリジェントな対話ロボット、その他の分野に適用することができる。従来の作業では、マルチモーダル融合前のモーダル間アライメントプロセスとモーダル内ノイズ情報を無視していた。我々は,MGLRA(Masked Graph Learning with Recursive Alignment)と呼ばれる新しい手法を開発し,この問題に対処した。
論文参考訳（メタデータ） (2024-07-23T02:23:51Z)
Revisiting Multi-modal Emotion Learning with Broad State Space Models and Probability-guidance Fusion [14.14051929942914]
我々は,長距離文脈意味情報を特徴展開段階において抽出し,特徴融合段階においてモーダル間意味情報の一貫性を最大化するべきであると論じる。近年の状態空間モデル (SSM) に着想を得たBroad Mambaを提案する。提案手法は,長距離コンテキストをモデル化する場合に,Transformerの計算限界やメモリ制限を克服できることを示す。
論文参考訳（メタデータ） (2024-04-27T10:22:03Z)
AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations [57.99479708224221]
AIMDiTと呼ばれる新しいフレームワークを提案し、深い特徴のマルチモーダル融合の問題を解決する。公開ベンチマークデータセットMELDでAIMDiTフレームワークを使用して行った実験では、Acc-7とw-F1メトリクスの2.34%と2.87%の改善が明らかにされた。
論文参考訳（メタデータ） (2024-04-12T11:31:18Z)
M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment [0.0]
本稿では,認知負荷評価のためのAVCAffeデータセットに適用した,新しいマルチモーダルマルチタスク学習フレームワークであるM&Mモデルを提案する。 M&Mは、オーディオとビデオの入力のための特別なストリームを特徴とする、デュアル・パスウェイ・アーキテクチャを通じてオーディオヴィジュアル・キューを独自に統合する。重要な革新は多面的マルチヘッドアテンション機構であり、同期マルチタスクの異なるモダリティを融合させる。
論文参考訳（メタデータ） (2024-03-14T14:49:40Z)
MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples [63.78384552789171]
本稿では,新しいマルチモーダル微調整パラダイムであるMMICTを紹介する。 M-Hub(Multi-Modal Hub)は,異なる入力や目的に応じて様々なマルチモーダル特徴をキャプチャするモジュールである。 M-Hubに基づいてMMICTは、MM-LLMがコンテキスト内視覚誘導されたテキスト特徴から学習し、その後、テキスト誘導された視覚特徴に基づいて条件付き出力を生成する。
論文参考訳（メタデータ） (2023-12-11T13:11:04Z)
MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts [92.76662894585809]
MMOE(Multimodal Mixtures of Experts)と呼ばれるマルチモーダルモデルの拡張手法を導入する。 MMoEは様々な種類のモデルに適用でき、改善できる。
論文参考訳（メタデータ） (2023-11-16T05:31:21Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
マルチモーダルな操作検出とグラウンド処理のためのトランスフォーマーベースのフレームワークを構築する。本フレームワークは,マルチモーダルアライメントの能力を維持しながら,モダリティ特有の特徴を同時に探求する。本稿では,グローバルな文脈的キューを各モーダル内に適応的に集約する暗黙的操作クエリ(IMQ)を提案する。
論文参考訳（メタデータ） (2023-09-22T06:55:41Z)
MCM: Multi-condition Motion Synthesis Framework for Multi-scenario [28.33039094451924]
多様な条件下で複数のシナリオにまたがる動き合成のための新しいパラダイムであるMCMを紹介する。 MCMフレームワークはDDPMのような拡散モデルと統合でき、マルチ条件情報入力に対応できる。提案手法は,タスク固有の手法に匹敵する,テキスト・ツー・モーションと音楽・ダンスの両タスクの競合的な結果をもたらす。
論文参考訳（メタデータ） (2023-09-06T14:17:49Z)
Deep Equilibrium Multimodal Fusion [88.04713412107947]
多重モーダル融合は、複数のモーダルに存在する相補的な情報を統合し、近年多くの注目を集めている。本稿では,動的多モード核融合プロセスの固定点を求めることにより,多モード核融合に対する新しいDeep equilibrium (DEQ)法を提案する。 BRCA,MM-IMDB,CMU-MOSI,SUN RGB-D,VQA-v2の実験により,DEC融合の優位性が示された。
論文参考訳（メタデータ） (2023-06-29T03:02:20Z)
MM-DFN: Multimodal Dynamic Fusion Network for Emotion Recognition in Conversations [5.5997926295092295]
会話におけるマルチモーダル感情認識 (ERC) は共感機械の開発にかなりの可能性を持っている。最近のグラフベース融合法は, グラフ内の非モーダル・クロスモーダル相互作用を探索することによって, 多モーダル情報を集約する。マルチモーダル・ダイナミック・フュージョン・ネットワーク(MM-DFN)を提案する。
論文参考訳（メタデータ） (2022-03-04T15:42:53Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。