Fugu-MT 論文翻訳(概要): Enhancing Image Aesthetics with Dual-Conditioned Diffusion Models Guided by Multimodal Perception

論文の概要: Enhancing Image Aesthetics with Dual-Conditioned Diffusion Models Guided by Multimodal Perception

arxiv url: http://arxiv.org/abs/2603.11556v1
Date: Thu, 12 Mar 2026 05:22:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:25.907129
Title: Enhancing Image Aesthetics with Dual-Conditioned Diffusion Models Guided by Multimodal Perception
Title（参考訳）: マルチモーダル知覚によるデュアルコンディション拡散モデルによる画像美意識の強化
Authors: Xinyu Nan, Ning Wang, Yuyao Zhai, Mei Yang,
Abstract要約: 画像の美的改善は、画像の美的欠陥を認識し、対応する編集操作を実行することを目的としている。画像編集モデルの最近の進歩は、制御性と柔軟性を大幅に向上させたが、画像美学の強化に苦慮している。マルチモーダルな美的知覚を持つ拡散型生成モデルであるDual-supervised Image Aesthetic Enhancement (DIAE)を提案する。
参考スコア（独自算出の注目度）: 6.873293280691424
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image aesthetic enhancement aims to perceive aesthetic deficiencies in images and perform corresponding editing operations, which is highly challenging and requires the model to possess creativity and aesthetic perception capabilities. Although recent advancements in image editing models have significantly enhanced their controllability and flexibility, they struggle with enhancing image aesthetic. The primary challenges are twofold: first, following editing instructions with aesthetic perception is difficult, and second, there is a scarcity of "perfectly-paired" images that have consistent content but distinct aesthetic qualities. In this paper, we propose Dual-supervised Image Aesthetic Enhancement (DIAE), a diffusion-based generative model with multimodal aesthetic perception. First, DIAE incorporates Multimodal Aesthetic Perception (MAP) to convert the ambiguous aesthetic instruction into explicit guidance by (i) employing detailed, standardized aesthetic instructions across multiple aesthetic attributes, and (ii) utilizing multimodal control signals derived from text-image pairs that maintain consistency within the same aesthetic attribute. Second, to mitigate the lack of "perfectly-paired" images, we collect "imperfectly-paired" dataset called IIAEData, consisting of images with varying aesthetic qualities while sharing identical semantics. To better leverage the weak matching characteristics of IIAEData during training, a dual-branch supervision framework is also introduced for weakly supervised image aesthetic enhancement. Experimental results demonstrate that DIAE outperforms the baselines and obtains superior image aesthetic scores and image content consistency scores.
Abstract（参考訳）: 画像の美的強調は、画像の美的欠陥を認識し、それに対応する編集操作を実行することを目的としており、これは非常に困難であり、創造性と美的知覚能力を保持する必要がある。画像編集モデルの最近の進歩は、制御性と柔軟性を大幅に向上させたが、画像美学の強化に苦慮している。主な課題は2つある: 第一に、審美的知覚を伴う編集指示に従うことは困難であり、第二に、一貫した内容を持つが、審美的特性が異なる「完璧にペアリングされた」画像が不足している。本稿では,マルチモーダルな美的知覚を持つ拡散型生成モデルであるDual-supervised Image Aesthetic Enhancement (DIAE)を提案する。第一に、DIAEはマルチモーダル審美知覚(MAP)を取り入れ、曖昧な審美指導を明示的指導に変換する。 (i)複数の美的属性にまたがる詳細で標準化された審美的指示を採用し、二同一の美的属性内で整合性を維持するテキストイメージ対から導出されるマルチモーダル制御信号を利用する。第二に、完璧なペア画像の欠如を軽減するために、同一のセマンティクスを共有しながら、さまざまな美的特徴を持つ画像からなる「完璧なペア画像」データセット「IIAEData」を収集します。トレーニング中のIIAEDataの弱いマッチング特性をよりよく活用するために、弱教師付き画像美的改善のためのデュアルブランチ監視フレームワークも導入された。実験により、DIAEはベースラインよりも優れ、優れた画像美的スコアと画像内容整合性スコアが得られることが示された。

論文の概要: Enhancing Image Aesthetics with Dual-Conditioned Diffusion Models Guided by Multimodal Perception

関連論文リスト