Fugu-MT 論文翻訳(概要): EmoScene: A Dual-space Dataset for Controllable Affective Image Generation

論文の概要: EmoScene: A Dual-space Dataset for Controllable Affective Image Generation

arxiv url: http://arxiv.org/abs/2604.00933v1
Date: Wed, 01 Apr 2026 14:10:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:32.028162
Title: EmoScene: A Dual-space Dataset for Controllable Affective Image Generation
Title（参考訳）: EmoScene: コントロール可能な影響画像生成のためのデュアルスペースデータセット
Authors: Li He, Longtai Zhang, Wenqiang Zhang, Yan Wang, Lizhe Qi,
Abstract要約: EmoSceneは、感情的次元と知覚的属性を共同でエンコードする大規模なデュアルスペース感情データセットである。個別の感情がVAD空間をいかに占めているか、そして、その影響がシーンレベルの知覚的要因と体系的に相関しているかを示す。
参考スコア（独自算出の注目度）: 36.90201432936213
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image diffusion models have achieved high visual fidelity, yet precise control over scene semantics and fine-grained affective tone remains challenging. Human visual affect arises from the rapid integration of contextual meaning, including valence, arousal, and dominance, with perceptual cues such as color harmony, luminance contrast, texture variation, curvature, and spatial layout. However, current text-to-image models rarely represent affective and perceptual factors within a unified representation, which limits their ability to synthesize scenes with coherent and nuanced emotional intent. To address this gap, we construct EmoScene, a large-scale dual-space emotion dataset that jointly encodes affective dimensions and perceptual attributes, with contextual semantics provided as supporting annotations. EmoScene contains 1.2M images across more than three hundred real-world scene categories, each annotated with discrete emotion labels, continuous VAD values, perceptual descriptors and textual captions. Multi-space analyses reveal how discrete emotions occupy the VAD space and how affect systematically correlates with scene-level perceptual factors. To benchmark EmoScene, we provide a lightweight reference baseline that injects dual-space controls into a frozen diffusion backbone via shallow cross-attention modulation, serving as a reproducible probe of affect controllability enabled by dual-space supervision.
Abstract（参考訳）: テキスト・ツー・イメージ拡散モデルは高い視覚的忠実度を達成したが、シーンのセマンティクスやきめ細かい情緒的トーンを正確に制御することは困難である。人間の視覚的影響は、色調、輝度コントラスト、テクスチャの変化、曲率、空間的レイアウトといった知覚的手がかりと、価値、覚醒、支配といった文脈的意味の迅速な統合から生じる。しかし、現在のテキスト・ツー・イメージモデルは、統一された表現の中で感情的・知覚的要素を表現することは滅多になく、一貫性とニュアンスのある感情的な意図でシーンを合成する能力を制限する。このギャップに対処するため,情緒的次元と知覚的属性を共同で符号化する大規模二空間感情データセットであるEmoSceneを構築し,アノテーションとしてコンテキスト意味論を提供する。 EmoSceneには、300以上の現実世界のシーンカテゴリにわたる1.2万の画像が含まれており、それぞれに個別の感情ラベル、連続的なVAD値、知覚記述子、テキストキャプションが注釈付けされている。マルチスペース分析は、個別の感情がVAD空間をどのように占めているか、そして、その影響がシーンレベルの知覚的要因と体系的に相関しているかを明らかにする。 EmoScene をベンチマークするために,両空間制御を凍結拡散バックボーンに注入する軽量な基準ベースラインを提供する。

関連論文リスト

Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling [2.8037951156321377]
本稿では,第10回ABAWチャレンジにおける表現課題に対するマルチモーダル感情認識フレームワークを提案する。本フレームワークは,視覚および音声表現学習のための大規模事前学習モデルを構築し,それらを統合マルチモーダルアーキテクチャに統合する。 ABAW 10th EXPRベンチマークの実験結果から,提案手法の有効性が示された。
論文参考訳（メタデータ） (2026-03-12T14:20:29Z)
EmoLat: Text-driven Image Sentiment Transfer via Emotion Latent Space [8.453871826832478]
EmoLatは、細粒度でテキスト駆動のイメージ感情伝達を可能にする新しい感情潜在空間である。 EmoLat内では、感情、オブジェクト、視覚属性間の関係構造をキャプチャする感情意味グラフが構築されている。 EmoLat上に構築されたクロスモーダルな感情伝達フレームワークは,テキストとEmoLat機能の併用によるイメージ感情の操作を実現する。
論文参考訳（メタデータ） (2026-01-17T15:07:36Z)
EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis [61.87711517626139]
EmoVerseは、解釈可能な視覚的感情分析を可能にする、大規模なオープンソースデータセットである。 219k以上の画像で、データセットはさらにカテゴリー感情状態(CES)と次元感情空間(DES)の2つのアノテーションを含んでいる。
論文参考訳（メタデータ） (2025-11-16T11:16:50Z)
SceneAdapt: Scene-aware Adaptation of Human Motion Diffusion [74.70024991949269]
テキスト条件付モーションモデルにシーン認識を注入するフレームワークであるSceneAdaptを紹介する。主要なアイデアは、2つの異なるデータセットをブリッジするプロキシタスクとして、テキストなしで学習可能なモーションインテリシングを使用することだ。その結果,SceneAdaptはシーン認識をテキスト・トゥ・モーション・モデルに効果的に注入することがわかった。
論文参考訳（メタデータ） (2025-10-14T23:42:10Z)
VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection [50.57849622045192]
本稿では,外部知識注入を用いた感情中心型VA表現学習のための効率的なフレームワークであるVAEmoを提案する。 VAEmoは、コンパクトな設計で最先端のパフォーマンスを実現し、統合されたクロスモーダルエンコーディングと感情認識のセマンティックガイダンスの利点を強調している。
論文参考訳（メタデータ） (2025-05-05T03:00:51Z)
SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
画像から感情を予測するために,SOLVER(Scene-Object Interrelated Visual Emotion Reasoning Network)を提案する。異なるオブジェクト間の感情関係を掘り下げるために、まずセマンティックな概念と視覚的特徴に基づいて感情グラフを構築します。また、シーンとオブジェクトを統合するScene-Object Fusion Moduleを設計し、シーンの特徴を利用して、提案したシーンベースのアテンションメカニズムでオブジェクトの特徴の融合プロセスを導出する。
論文参考訳（メタデータ） (2021-10-24T02:41:41Z)
Emosaic: Visualizing Affective Content of Text at Varying Granularity [0.0]
エモザイク(Emosaic)は、テキストの感情的なトーンを可視化するツールである。我々は、人間の感情の3次元モデルを構築した。
論文参考訳（メタデータ） (2020-02-24T07:25:01Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。