Fugu-MT 論文翻訳(概要): Neural Scene Designer: Self-Styled Semantic Image Manipulation

論文の概要: Neural Scene Designer: Self-Styled Semantic Image Manipulation

arxiv url: http://arxiv.org/abs/2509.01405v1
Date: Mon, 01 Sep 2025 11:59:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-04 15:17:03.677902
Title: Neural Scene Designer: Self-Styled Semantic Image Manipulation
Title（参考訳）: ニューラルシーンデザイナ:Semantic Image Manipulation
Authors: Jianman Lin, Tianshui Chen, Chunmei Qing, Zhijing Yang, Shuangping Huang, Yuheng Ren, Liang Lin,
Abstract要約: 我々は,ユーザが指定したシーン領域のリアルな写真操作を可能にする新しいフレームワークであるNeural Scene Designer (NSD)を紹介した。 NSDは、ユーザ意図とのセマンティックアライメントと、周辺環境とのスタイリスティックな整合性の両方を保証する。細かなスタイル表現を捉えるために,プログレッシブ・セルフスタイル表現学習(PSRL)モジュールを提案する。
参考スコア（独自算出の注目度）: 67.43125248646653
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Maintaining stylistic consistency is crucial for the cohesion and aesthetic appeal of images, a fundamental requirement in effective image editing and inpainting. However, existing methods primarily focus on the semantic control of generated content, often neglecting the critical task of preserving this consistency. In this work, we introduce the Neural Scene Designer (NSD), a novel framework that enables photo-realistic manipulation of user-specified scene regions while ensuring both semantic alignment with user intent and stylistic consistency with the surrounding environment. NSD leverages an advanced diffusion model, incorporating two parallel cross-attention mechanisms that separately process text and style information to achieve the dual objectives of semantic control and style consistency. To capture fine-grained style representations, we propose the Progressive Self-style Representational Learning (PSRL) module. This module is predicated on the intuitive premise that different regions within a single image share a consistent style, whereas regions from different images exhibit distinct styles. The PSRL module employs a style contrastive loss that encourages high similarity between representations from the same image while enforcing dissimilarity between those from different images. Furthermore, to address the lack of standardized evaluation protocols for this task, we establish a comprehensive benchmark. This benchmark includes competing algorithms, dedicated style-related metrics, and diverse datasets and settings to facilitate fair comparisons. Extensive experiments conducted on our benchmark demonstrate the effectiveness of the proposed framework.
Abstract（参考訳）: スタイリスティックな一貫性を維持することは、画像の凝集と美的魅力に不可欠である。しかし、既存の手法は主に生成されたコンテンツのセマンティックコントロールに焦点を当てており、しばしばこの一貫性を維持するための重要なタスクを無視している。本研究では,ユーザ固有のシーン領域を写真でリアルに操作できる新しいフレームワークであるNeural Scene Designer(NSD)を紹介し,ユーザ意図とのセマンティックアライメントと周辺環境とのスタイル整合性を確保した。 NSDは高度な拡散モデルを活用し、テキストとスタイル情報を別々に処理し、セマンティックコントロールとスタイル整合性の二重目的を達成する2つの並列な相互注意機構を取り入れている。細かなスタイル表現を捉えるために,プログレッシブ・セルフスタイル表現学習(PSRL)モジュールを提案する。このモジュールは、単一の画像内の異なる領域が一貫したスタイルを共有するという直感的な前提に基づいており、異なる画像の異なる領域は異なるスタイルを示す。 PSRLモジュールは、異なる画像との相違を強制しながら、同じ画像からの表現間の高い類似性を促進するスタイルのコントラスト損失を用いる。さらに,このタスクの標準化評価プロトコルの欠如に対処するため,総合的なベンチマークを構築した。このベンチマークには、競合するアルゴリズム、専用のスタイル関連のメトリクス、公正な比較を容易にするさまざまなデータセットと設定が含まれている。本ベンチマークでは,提案手法の有効性を実証した。

論文の概要: Neural Scene Designer: Self-Styled Semantic Image Manipulation

関連論文リスト