Fugu-MT 論文翻訳(概要): UniEmo: Unifying Emotional Understanding and Generation with Learnable Expert Queries

論文の概要: UniEmo: Unifying Emotional Understanding and Generation with Learnable Expert Queries

arxiv url: http://arxiv.org/abs/2507.23372v1
Date: Thu, 31 Jul 2025 09:39:27 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-01 17:19:09.424545
Title: UniEmo: Unifying Emotional Understanding and Generation with Learnable Expert Queries
Title（参考訳）: UniEmo: 学習可能なエキスパートクエリによる感情理解と生成の統合
Authors: Yijie Zhu, Lingsen Zhang, Zitong Yu, Rui Shao, Tao Tan, Liqiang Nie,
Abstract要約: 感情的理解と生成をシームレスに統合する統合フレームワークを提案する。我々は,UniEmoが感情的理解と生成の両タスクにおいて,最先端の手法を著しく上回っていることを示す。
参考スコア（独自算出の注目度）: 61.5273479616832
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Emotional understanding and generation are often treated as separate tasks, yet they are inherently complementary and can mutually enhance each other. In this paper, we propose the UniEmo, a unified framework that seamlessly integrates these two tasks. The key challenge lies in the abstract nature of emotions, necessitating the extraction of visual representations beneficial for both tasks. To address this, we propose a hierarchical emotional understanding chain with learnable expert queries that progressively extracts multi-scale emotional features, thereby serving as a foundational step for unification. Simultaneously, we fuse these expert queries and emotional representations to guide the diffusion model in generating emotion-evoking images. To enhance the diversity and fidelity of the generated emotional images, we further introduce the emotional correlation coefficient and emotional condition loss into the fusion process. This step facilitates fusion and alignment for emotional generation guided by the understanding. In turn, we demonstrate that joint training allows the generation component to provide implicit feedback to the understanding part. Furthermore, we propose a novel data filtering algorithm to select high-quality and diverse emotional images generated by the well-trained model, which explicitly feedback into the understanding part. Together, these generation-driven dual feedback processes enhance the model's understanding capacity. Extensive experiments show that UniEmo significantly outperforms state-of-the-art methods in both emotional understanding and generation tasks. The code for the proposed method is available at https://github.com/JiuTian-VL/UniEmo.
Abstract（参考訳）: 感情的理解と生成は、しばしば別々のタスクとして扱われるが、それらは本質的に相補的であり、相互に強化することができる。本稿では,これら2つのタスクをシームレスに統合する統合フレームワークUniEmoを提案する。重要な課題は感情の抽象的な性質にあり、両方のタスクに有益な視覚的表現の抽出が必要である。そこで本研究では,多段階の感情的特徴を段階的に抽出し,統合のための基礎的なステップとして機能する,学習可能な専門家クエリを備えた階層的感情理解チェーンを提案する。同時に、これらの専門家クエリと感情表現を融合させて、感情誘発画像の生成において拡散モデルを導く。生成した感情イメージの多様性と忠実度を高めるため,融合過程に感情相関係数と感情条件損失を導入する。このステップは、理解によって導かれる感情生成のための融合とアライメントを促進する。共同学習により、生成コンポーネントが理解部に暗黙のフィードバックを与えることができることを示す。さらに,よく訓練されたモデルによって生成された高品質で多様な感情的イメージを抽出し,理解部に明示的にフィードバックする新しいデータフィルタリングアルゴリズムを提案する。これらの世代駆動の二重フィードバックプロセスは、モデルの理解能力を高める。広汎な実験により、UniEmoは感情的理解と生成の両方において最先端の手法を著しく上回っていることが示された。提案されたメソッドのコードはhttps://github.com/JiuTian-VL/UniEmoで公開されている。

関連論文リスト

CoEmoGen: Towards Semantically-Coherent and Scalable Emotional Image Content Generation [3.5418954219513625]
感情画像コンテンツ生成(EICG)は、与えられた感情カテゴリに基づいて、意味的に明確で、感情的に忠実な画像を生成することを目的としている。セマンティックコヒーレンスと高いスケーラビリティで有名な新しいパイプラインであるCoEmoGenを提案する。スケーラビリティを直感的に示すために,情緒的な芸術的イメージの大規模なデータセットであるEmoArtをキュレートする。
論文参考訳（メタデータ） (2025-08-05T15:04:34Z)
Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding [24.884935271771624]
Emotion-Qwenは、感情理解と一般的な視覚言語推論の両方を強化するために設計されたフレームワークである。 Emotion-Qwenは、Mixture of Experts (MoE)パラダイムに基づいた洗練されたハイブリッドを組み込んでいる。ビデオ感情推論(VER)データセットを構築し,40万本以上のバイリンガルビデオクリップと詳細な記述的アノテーションを用いて,感情・クウェンの感情推論能力をさらに強化する。
論文参考訳（メタデータ） (2025-05-10T16:15:26Z)
Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talkは、感情と同一性を切り離し、類似した特徴を持つ感情を協調するフレームワークである。我々は、モーダル・アテンションを通して、音声と視覚の感情の手がかりを共同でモデル化するアンタングル型感情埋め込み装置を開発した。次に,学習可能な感情バンクを用いた相関強化感情調和モジュールを提案する。第3に、拡散過程における感情の一貫性を強制する感情識別目標を設計する。
論文参考訳（メタデータ） (2025-04-25T05:28:21Z)
An Audio-Visual Fusion Emotion Generation Model Based on Neuroanatomical Alignment [15.98131469205444]
我々は、脳に似た感情学習のためのオーディオ・ビジュアル・フュージョン(AVF-BEL)という新しいフレームワークを紹介する。従来の脳に触発された感情学習法とは対照的に,本手法は音声・視覚的感情融合と生成モデルを改善する。実験結果から,音声-視覚融合感情学習モデルとの類似性に大きな改善が認められた。
論文参考訳（メタデータ） (2025-02-21T14:26:58Z)
EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
マルチモーダル・大規模言語モデル(MLLM)は、目的とするマルチモーダル認識タスクにおいて顕著な性能を達成している。しかし、主観的、感情的にニュアンスのあるマルチモーダルコンテンツを解釈する能力はほとんど解明されていない。 EmoLLMは、マルチモーダルな感情理解のための新しいモデルであり、2つのコア技術が組み込まれている。
論文参考訳（メタデータ） (2024-06-24T08:33:02Z)
Enhancing Emotional Generation Capability of Large Language Models via Emotional Chain-of-Thought [50.13429055093534]
大規模言語モデル(LLM)は様々な感情認識タスクにおいて顕著な性能を示した。本研究では,感情生成タスクにおけるLLMの性能を高めるための感情連鎖(ECoT)を提案する。
論文参考訳（メタデータ） (2024-01-12T16:42:10Z)
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling [50.99252242917458]
会話音声合成(CSS)は,会話環境の中で適切な韻律と感情のインフレクションで発話を正確に表現することを目的としている。データ不足の問題に対処するため、私たちはカテゴリと強度の点で感情的なラベルを慎重に作成します。我々のモデルは感情の理解と表現においてベースラインモデルよりも優れています。
論文参考訳（メタデータ） (2023-12-19T08:47:50Z)
Stimuli-Aware Visual Emotion Analysis [75.68305830514007]
本稿では,刺激選択,特徴抽出,感情予測の3段階からなる刺激認識型視覚感情分析(VEA)手法を提案する。我々の知る限りでは、エンド・ツー・エンドのネットワークでVEAに刺激選択プロセスを導入するのは初めてです。実験により、提案手法は、4つの公的な視覚的感情データセットに対する最先端のアプローチよりも一貫して優れていることが示された。
論文参考訳（メタデータ） (2021-09-04T08:14:52Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。