Related papers: Neural Scene Designer: Self-Styled Semantic Image Manipulation

Neural Scene Designer: Self-Styled Semantic Image Manipulation

URL: http://arxiv.org/abs/2509.01405v1
Date: Mon, 01 Sep 2025 11:59:03 GMT
Title: Neural Scene Designer: Self-Styled Semantic Image Manipulation
Authors: Jianman Lin, Tianshui Chen, Chunmei Qing, Zhijing Yang, Shuangping Huang, Yuheng Ren, Liang Lin,
Abstract summary: We introduce the Neural Scene Designer (NSD), a novel framework that enables photo-realistic manipulation of user-specified scene regions.<n>NSD ensures both semantic alignment with user intent and stylistic consistency with the surrounding environment.<n>To capture fine-grained style representations, we propose the Progressive Self-style Representational Learning (PSRL) module.
Score: 67.43125248646653
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Maintaining stylistic consistency is crucial for the cohesion and aesthetic appeal of images, a fundamental requirement in effective image editing and inpainting. However, existing methods primarily focus on the semantic control of generated content, often neglecting the critical task of preserving this consistency. In this work, we introduce the Neural Scene Designer (NSD), a novel framework that enables photo-realistic manipulation of user-specified scene regions while ensuring both semantic alignment with user intent and stylistic consistency with the surrounding environment. NSD leverages an advanced diffusion model, incorporating two parallel cross-attention mechanisms that separately process text and style information to achieve the dual objectives of semantic control and style consistency. To capture fine-grained style representations, we propose the Progressive Self-style Representational Learning (PSRL) module. This module is predicated on the intuitive premise that different regions within a single image share a consistent style, whereas regions from different images exhibit distinct styles. The PSRL module employs a style contrastive loss that encourages high similarity between representations from the same image while enforcing dissimilarity between those from different images. Furthermore, to address the lack of standardized evaluation protocols for this task, we establish a comprehensive benchmark. This benchmark includes competing algorithms, dedicated style-related metrics, and diverse datasets and settings to facilitate fair comparisons. Extensive experiments conducted on our benchmark demonstrate the effectiveness of the proposed framework.

Related papers

DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition [69.10628479553709]
We introduce DRC, a novel personalized image generation framework that enhances Large Multimodal Models (LMMs)<n> DRC explicitly extracts user style preferences and semantic intentions from history images and the reference image, respectively.<n>It involves two critical learning stages: 1) Disentanglement learning, which employs a dual-tower disentangler to explicitly separate style and semantic features, optimized via a reconstruction-driven paradigm with difficulty-aware importance sampling; and 2) Personalized modeling, which applies semantic-preserving augmentations to effectively adapt the disentangled representations for robust personalized generation.
arXiv Detail & Related papers (2025-04-24T08:10:10Z)
Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis [15.76266032768078]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.<n>We first introduce vision guidance as a foundational spatial cue within the perturbed distribution.<n>We propose a universal framework, Layered Rendering Diffusion (LRDiff), which constructs an image-rendering process with multiple layers.
arXiv Detail & Related papers (2023-11-30T10:36:19Z)
Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images. We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process. Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z)
A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework. We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature. Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z)
Learning to Model Multimodal Semantic Alignment for Story Visualization [58.16484259508973]
Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story. Current works face the problem of semantic misalignment because of their fixed architecture and diversity of input modalities. We explore the semantic alignment between text and image representations by learning to match their semantic levels in the GAN-based generative model.
arXiv Detail & Related papers (2022-11-14T11:41:44Z)
TSIT: A Simple and Versatile Framework for Image-to-Image Translation [103.92203013154403]
We introduce a simple and versatile framework for image-to-image translation. We provide a carefully designed two-stream generative model with newly proposed feature transformations. This allows multi-scale semantic structure information and style representation to be effectively captured and fused by the network. A systematic study compares the proposed method with several state-of-the-art task-specific baselines, verifying its effectiveness in both perceptual quality and quantitative evaluations.
arXiv Detail & Related papers (2020-07-23T15:34:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.