Related papers: Sketch-Guided Scene Image Generation

Sketch-Guided Scene Image Generation

URL: http://arxiv.org/abs/2407.06469v1
Date: Tue, 9 Jul 2024 00:16:45 GMT
Title: Sketch-Guided Scene Image Generation
Authors: Tianyu Zhang, Xiaoxuan Xie, Xusheng Du, Haoran Xie,
Abstract summary: We propose a sketch-guided scene image generation framework, decomposing the task of scene image scene generation from sketch inputs. We employ pre-trained diffusion models to convert each single object drawing into an image of the object, inferring additional details while maintaining the sparse sketch structure. In scene-level image construction, we generate the latent representation of the scene image using the separated background prompts.
Score: 11.009579131371018
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-image models are showcasing the impressive ability to create high-quality and diverse generative images. Nevertheless, the transition from freehand sketches to complex scene images remains challenging using diffusion models. In this study, we propose a novel sketch-guided scene image generation framework, decomposing the task of scene image scene generation from sketch inputs into object-level cross-domain generation and scene-level image construction. We employ pre-trained diffusion models to convert each single object drawing into an image of the object, inferring additional details while maintaining the sparse sketch structure. In order to maintain the conceptual fidelity of the foreground during scene generation, we invert the visual features of object images into identity embeddings for scene generation. In scene-level image construction, we generate the latent representation of the scene image using the separated background prompts, and then blend the generated foreground objects according to the layout of the sketch input. To ensure the foreground objects' details remain unchanged while naturally composing the scene image, we infer the scene image on the blended latent representation using a global prompt that includes the trained identity tokens. Through qualitative and quantitative experiments, we demonstrate the ability of the proposed approach to generate scene images from hand-drawn sketches surpasses the state-of-the-art approaches.

Related papers

Recovering Partially Corrupted Major Objects through Tri-modality Based Image Completion [13.846868357952419]
Diffusion models have become widely adopted in image completion tasks. A persistent challenge arises when an object is partially obscured in the damaged region, yet its remaining parts are still visible in the background. We propose supplementing text-based guidance with a novel visual aid: a casual sketch. This sketch supplies critical structural cues, enabling the generative model to produce an object structure that seamlessly integrates with the existing background.
arXiv Detail & Related papers (2025-03-10T08:34:31Z)
SceneBooth: Diffusion-based Framework for Subject-preserved Text-to-Image Generation [46.43776651071455]
Existing methods often learn subject representation and incorporate it into the prompt embedding to guide image generation. This paper approaches a novel framework named SceneBooth for subject-preserved text-to-image generation. Our SceneBooth fixes the given subject image and generates its background image guided by the text prompts.
arXiv Detail & Related papers (2025-01-07T03:18:15Z)
Object-level Visual Prompts for Compositional Image Generation [75.6085388740087]
We introduce a method for composing object-level visual prompts within a text-to-image diffusion model. A key challenge in this task is to preserve the identity of the objects depicted in the input visual prompts. We introduce a new KV-mixed cross-attention mechanism, in which keys and values are learned from distinct visual representations.
arXiv Detail & Related papers (2025-01-02T18:59:44Z)
Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering [56.68286440268329]
correct insertion of virtual objects in images of real-world scenes requires a deep understanding of the scene's lighting, geometry and materials. We propose using a personalized large diffusion model as guidance to a physically based inverse rendering process. Our method recovers scene lighting and tone-mapping parameters, allowing the photorealistic composition of arbitrary virtual objects in single frames or videos of indoor or outdoor scenes.
arXiv Detail & Related papers (2024-08-19T05:15:45Z)
Sketch-guided Image Inpainting with Partial Discrete Diffusion Process [5.005162730122933]
We introduce a novel partial discrete diffusion process (PDDP) for sketch-guided inpainting. PDDP corrupts the masked regions of the image and reconstructs these masked regions conditioned on hand-drawn sketches. The proposed novel transformer module accepts two inputs -- the image containing the masked region to be inpainted and the query sketch to model the reverse diffusion process.
arXiv Detail & Related papers (2024-04-18T07:07:38Z)
DiffMorph: Text-less Image Morphing with Diffusion Models [0.0]
verb|DiffMorph| synthesizes images that mix concepts without the use of textual prompts. verb|DiffMorph| takes an initial image with conditioning artist-drawn sketches to generate a morphed image. We employ a pre-trained text-to-image diffusion model and fine-tune it to reconstruct each image faithfully.
arXiv Detail & Related papers (2024-01-01T12:42:32Z)
Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models [80.75258849913574]
In this paper, we consider the inverse problem -- given a collection of different images, can we discover the generative concepts that represent each image? We present an unsupervised approach to discover generative concepts from a collection of images, disentangling different art styles in paintings, objects, and lighting from kitchen scenes, and discovering image classes given ImageNet images.
arXiv Detail & Related papers (2023-06-08T17:02:15Z)
Text-Guided Scene Sketch-to-Photo Synthesis [5.431298869139175]
We propose a method for scene-level sketch-to-photo synthesis with text guidance. To train our model, we use self-supervised learning from a set of photographs. Experiments show that the proposed method translates original sketch images that are not extracted from color images into photos with compelling visual quality.
arXiv Detail & Related papers (2023-02-14T08:13:36Z)
Scene Designer: a Unified Model for Scene Search and Synthesis from Sketch [7.719705312172286]
Scene Designer is a novel method for searching and generating images using free-hand sketches of scene compositions. Our core contribution is a single unified model to learn both a cross-modal search embedding for matching sketched compositions to images, and an object embedding for layout synthesis.
arXiv Detail & Related papers (2021-08-16T21:40:16Z)
Neural Scene Graphs for Dynamic Scenes [57.65413768984925]
We present the first neural rendering method that decomposes dynamic scenes into scene graphs. We learn implicitly encoded scenes, combined with a jointly learned latent representation to describe objects with a single implicit function.
arXiv Detail & Related papers (2020-11-20T12:37:10Z)
Semantic-Guided Inpainting Network for Complex Urban Scenes Manipulation [19.657440527538547]
In this work, we propose a novel deep learning model to alter a complex urban scene by removing a user-specified portion of the image. Inspired by recent works on image inpainting, our proposed method leverages the semantic segmentation to model the content and structure of the image. To generate reliable results, we design a new decoder block that combines the semantic segmentation and generation task.
arXiv Detail & Related papers (2020-10-19T09:17:17Z)
SketchEmbedNet: Learning Novel Concepts by Imitating Drawings [125.45799722437478]
We explore properties of image representations learned by training a model to produce sketches of images. We show that this generative, class-agnostic model produces informative embeddings of images from novel examples, classes, and even novel datasets in a few-shot setting.
arXiv Detail & Related papers (2020-08-27T16:43:28Z)
Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed Scenes [54.836331922449666]
We propose a Semantic Guidance and Evaluation Network (SGE-Net) to update the structural priors and the inpainted image. It utilizes semantic segmentation map as guidance in each scale of inpainting, under which location-dependent inferences are re-evaluated. Experiments on real-world images of mixed scenes demonstrated the superiority of our proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-15T17:49:20Z)
SketchyCOCO: Image Generation from Freehand Scene Sketches [71.85577739612579]
We introduce the first method for automatic image generation from scene-level freehand sketches. Key contribution is an attribute vector bridged Geneversarative Adrial Network called EdgeGAN. We have built a large-scale composite dataset called SketchyCOCO to support and evaluate the solution.
arXiv Detail & Related papers (2020-03-05T14:54:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.