Fugu-MT 論文翻訳(概要): CFN-ESA: A Cross-Modal Fusion Network with Emotion-Shift Awareness for Dialogue Emotion Recognition

論文の概要: CFN-ESA: A Cross-Modal Fusion Network with Emotion-Shift Awareness for Dialogue Emotion Recognition

arxiv url: http://arxiv.org/abs/2307.15432v2
Date: Sat, 13 Apr 2024 01:05:05 GMT
ステータス: 翻訳完了
システム内更新日: 2024-04-16 23:57:12.078490
Title: CFN-ESA: A Cross-Modal Fusion Network with Emotion-Shift Awareness for Dialogue Emotion Recognition
Title（参考訳）: CFN-ESA:対話感情認識のための感情シフト認識型クロスモーダルフュージョンネットワーク
Authors: Jiang Li, Xiaoping Wang, Yingjian Liu, Zhigang Zeng,
Abstract要約: 会話における感情認識のための感情シフト認識型クロスモーダルフュージョンネットワーク(CFN-ESA)を提案する。 CFN-ESAは、ユニモーダルエンコーダ(RUME)、クロスモーダルエンコーダ(ACME)、感情シフトモジュール(LESM)からなる。
参考スコア（独自算出の注目度）: 34.24557248359872
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal emotion recognition in conversation (ERC) has garnered growing attention from research communities in various fields. In this paper, we propose a Cross-modal Fusion Network with Emotion-Shift Awareness (CFN-ESA) for ERC. Extant approaches employ each modality equally without distinguishing the amount of emotional information in these modalities, rendering it hard to adequately extract complementary information from multimodal data. To cope with this problem, in CFN-ESA, we treat textual modality as the primary source of emotional information, while visual and acoustic modalities are taken as the secondary sources. Besides, most multimodal ERC models ignore emotion-shift information and overfocus on contextual information, leading to the failure of emotion recognition under emotion-shift scenario. We elaborate an emotion-shift module to address this challenge. CFN-ESA mainly consists of unimodal encoder (RUME), cross-modal encoder (ACME), and emotion-shift module (LESM). RUME is applied to extract conversation-level contextual emotional cues while pulling together data distributions between modalities; ACME is utilized to perform multimodal interaction centered on textual modality; LESM is used to model emotion shift and capture emotion-shift information, thereby guiding the learning of the main task. Experimental results demonstrate that CFN-ESA can effectively promote performance for ERC and remarkably outperform state-of-the-art models.
Abstract（参考訳）: 会話におけるマルチモーダル感情認識(ERC)は,様々な分野の研究コミュニティから注目を集めている。本稿では,感情シフト認識(CFN-ESA)を用いたERC用クロスモーダルフュージョンネットワークを提案する。既存のアプローチでは、これらのモダリティの感情情報の量を区別することなく、各モダリティを等しく使い、マルチモーダルデータから補完的な情報を適切に抽出することは困難である。この問題に対処するため、CFN-ESAでは、視覚的・音響的モダリティを二次情報源としながら、テキストのモダリティを感情情報の主源として扱う。さらに、ほとんどのマルチモーダルERCモデルは、感情シフト情報を無視し、文脈情報に重きを置いているため、感情シフトシナリオ下での感情認識の失敗につながっている。この課題に対処するために、感情シフトモジュールを詳しく説明します。 CFN-ESAは主に、ユニモーダルエンコーダ(RUME)、クロスモーダルエンコーダ(ACME)、感情シフトモジュール(LESM)から構成される。 RUMEは、モダリティ間のデータ分布をまとめながら会話レベルの文脈的感情的手がかりを抽出し、ACMEは、テキストのモダリティを中心としたマルチモーダルな相互作用を実行するために、LESMは、感情の変化をモデル化し、感情の変化情報をキャプチャするために、メインタスクの学習を導くために使用される。実験の結果,CFN-ESAはERCの性能を効果的に向上し,最先端モデルよりも優れていた。

関連論文リスト

MCIHN: A Hybrid Network Model Based on Multi-path Cross-modal Interaction for Multimodal Emotion Recognition [7.944119407791842]
マルチパス・クロスモーダル相互作用(MCIHN)に基づくハイブリッドネットワークモデルを提案する。対向オートエンコーダ(AAE)は、各モードごとに別々に構築される。潜伏符号は事前に定義されたクロスモーダルゲート機構モデル(CGMM)に入力される
論文参考訳（メタデータ） (2025-10-28T16:04:03Z)
Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier [53.55996102181836]
本稿では,感情関係検証器 (ERV) と説明リワードを提案する。本手法は,対象感情と明確に一致した推論をモデルに導出する。我々のアプローチは、説明と予測の整合性を高めるだけでなく、MLLMが感情的に一貫性があり、信頼できる対話を実現するのにも役立ちます。
論文参考訳（メタデータ） (2025-10-27T16:40:17Z)
GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations [35.63053777817013]
GatedxLSTMは、会話におけるマルチモーダル感情認識(ERC)モデルである。話者と会話相手の双方の声と書き起こしを考慮し、感情的なシフトを駆動する最も影響力のある文章を特定する。 4クラスの感情分類において,オープンソース手法間でのSOTA(State-of-the-art)性能を実現する。
論文参考訳（メタデータ） (2025-03-26T18:46:18Z)
EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
マルチモーダル・大規模言語モデル(MLLM)は、目的とするマルチモーダル認識タスクにおいて顕著な性能を達成している。しかし、主観的、感情的にニュアンスのあるマルチモーダルコンテンツを解釈する能力はほとんど解明されていない。 EmoLLMは、マルチモーダルな感情理解のための新しいモデルであり、2つのコア技術が組み込まれている。
論文参考訳（メタデータ） (2024-06-24T08:33:02Z)
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning [55.127202990679976]
28,618粒の粗粒と4,487粒の細粒のアノテートサンプルを含むMERRデータセットを導入した。このデータセットは、さまざまなシナリオから学習し、現実のアプリケーションに一般化することを可能にする。本研究では,感情特異的エンコーダによる音声,視覚,テキスト入力をシームレスに統合するモデルであるEmotion-LLaMAを提案する。
論文参考訳（メタデータ） (2024-06-17T03:01:22Z)
Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning [40.101313334772016]
会話における感情認識の目的は、文脈情報に基づいて発話の感情カテゴリーを特定することである。従来のERC法は、クロスモーダル核融合のための単純な接続に依存していた。本稿では,ベクトル接続に基づくモーダル融合感情予測ネットワークを提案する。
論文参考訳（メタデータ） (2024-05-28T07:22:30Z)
MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis [70.06396781553191]
MM-TTS(Multimodal Emotional Text-to-Speech System)は、複数のモーダルからの感情的手がかりを利用して、高表現的で感情的に共鳴する音声を生成する統合フレームワークである。 Emotion Prompt Alignment Module (EP-Align),Emotion Embedding-induced TTS (EMI-TTS),Emotion Embedding-induced TTS (Emotion Embedding-induced TTS) の2つの主要なコンポーネントで構成されている。
論文参考訳（メタデータ） (2024-04-29T03:19:39Z)
AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations [57.99479708224221]
AIMDiTと呼ばれる新しいフレームワークを提案し、深い特徴のマルチモーダル融合の問題を解決する。公開ベンチマークデータセットMELDでAIMDiTフレームワークを使用して行った実験では、Acc-7とw-F1メトリクスの2.34%と2.87%の改善が明らかにされた。
論文参考訳（メタデータ） (2024-04-12T11:31:18Z)
EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation [34.24557248359872]
本稿では,ERCタスクに対する感情的慣性・伝染型依存性モデリング手法(EmotionIC)を提案する。 EmotionICは3つの主要コンポーネント、すなわちIDマスク付きマルチヘッド注意(IMMHA)、対話型Gated Recurrent Unit(DiaGRU)、Skip-chain Conditional Random Field(SkipCRF)から構成されている。実験結果から,提案手法は4つのベンチマークデータセットにおいて,最先端のモデルよりも大幅に優れていることが示された。
論文参考訳（メタデータ） (2023-03-20T13:58:35Z)
M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation [1.3864478040954673]
視覚,音声,テキストのモダリティから感情関連特徴を抽出するマルチモーダルフュージョンネットワーク(M2FNet)を提案する。マルチヘッドアテンションに基づく融合機構を用いて、入力データの感情に富んだ潜在表現を結合する。提案する特徴抽出器は,音声および視覚データから感情関連特徴を学習するために,適応的マージンに基づく新しい三重項損失関数を用いて訓練される。
論文参考訳（メタデータ） (2022-06-05T14:18:58Z)
Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models [53.31917090073727]
本稿では,音声とテキストのモダリティから,伝達学習モデルと微調整モデルとを融合したニューラルネットワークによる感情認識フレームワークを提案する。本稿では,対話型感情的モーションキャプチャー・データセットにおけるマルチモーダル・アプローチの有効性を評価する。
論文参考訳（メタデータ） (2022-02-16T00:23:42Z)
Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts [2.443125107575822]
会話における感情認識(ERC)は重要かつ活発な研究課題である。最近の研究は、ERCタスクに複数のモダリティを使用することの利点を示している。マルチモーダルERCモデルを提案し,感情シフト成分で拡張する。
論文参考訳（メタデータ） (2021-12-03T14:39:04Z)
Emotion Recognition from Multiple Modalities: Fundamentals and Methodologies [106.62835060095532]
マルチモーダル感情認識(MER)のいくつかの重要な側面について論じる。まず、広く使われている感情表現モデルと感情モダリティの簡単な紹介から始める。次に、既存の感情アノテーション戦略とそれに対応する計算タスクを要約する。最後に,実世界のアプリケーションについて概説し,今後の方向性について論じる。
論文参考訳（メタデータ） (2021-08-18T21:55:20Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。