Related papers: Diffusion Self-Guidance for Controllable Image Generation

Diffusion Self-Guidance for Controllable Image Generation

URL: http://arxiv.org/abs/2306.00986v3
Date: Sun, 11 Jun 2023 23:36:38 GMT
Title: Diffusion Self-Guidance for Controllable Image Generation
Authors: Dave Epstein, Allan Jabri, Ben Poole, Alexei A. Efros, Aleksander Holynski
Abstract summary: Self-guidance provides greater control over generated images by guiding the internal representations of diffusion models. We show how a simple set of properties can be composed to perform challenging image manipulations. We also show that self-guidance can be used to edit real images.
Score: 106.59989386924136
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large-scale generative models are capable of producing high-quality images from detailed text descriptions. However, many aspects of an image are difficult or impossible to convey through text. We introduce self-guidance, a method that provides greater control over generated images by guiding the internal representations of diffusion models. We demonstrate that properties such as the shape, location, and appearance of objects can be extracted from these representations and used to steer sampling. Self-guidance works similarly to classifier guidance, but uses signals present in the pretrained model itself, requiring no additional models or training. We show how a simple set of properties can be composed to perform challenging image manipulations, such as modifying the position or size of objects, merging the appearance of objects in one image with the layout of another, composing objects from many images into one, and more. We also show that self-guidance can be used to edit real images. For results and an interactive demo, see our project page at https://dave.ml/selfguidance/

Related papers

LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation [17.169772329737913]
We present LocRef-Diffusion, a tuning-free model capable of personalized customization of multiple instances' appearance and position within an image. To enhance the precision of instance placement, we introduce a layout-net, which controls instance generation locations. To improve the appearance fidelity to reference images, we employ an appearance-net that extracts instance appearance features.
arXiv Detail & Related papers (2024-11-22T08:44:39Z)
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions [66.92809850624118]
PixWizard is an image-to-image visual assistant designed for image generation, manipulation, and translation based on free-from language instructions. We tackle a variety of vision tasks into a unified image-text-to-image generation framework and curate an Omni Pixel-to-Pixel Instruction-Tuning dataset. Our experiments demonstrate that PixWizard not only shows impressive generative and understanding abilities for images with diverse resolutions but also exhibits promising generalization capabilities with unseen tasks and human instructions.
arXiv Detail & Related papers (2024-09-23T17:59:46Z)
EraseDraw: Learning to Draw Step-by-Step via Erasing Objects from Images [24.55843674256795]
Prior works often fail by making global changes to the image, inserting objects in unrealistic spatial locations, and generating inaccurate lighting details. We observe that while state-of-the-art models perform poorly on object insertion, they can remove objects and erase the background in natural images very well. We show compelling results on diverse insertion prompts and images across various domains.
arXiv Detail & Related papers (2024-08-31T18:37:48Z)
Outline-Guided Object Inpainting with Diffusion Models [11.391452115311798]
Instance segmentation datasets play a crucial role in training accurate and robust computer vision models. We show how this issue can be mitigated by starting with small annotated instance segmentation datasets and augmenting them to obtain a sizeable annotated dataset. We generate new images using a diffusion-based inpainting model to fill out the masked area with a desired object class by guiding the diffusion through the object outline.
arXiv Detail & Related papers (2024-02-26T09:21:17Z)
PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor [135.17302411419834]
PAIR Diffusion is a generic framework that enables a diffusion model to control the structure and appearance of each object in the image. We show that having control over the properties of each object in an image leads to comprehensive editing capabilities. Our framework allows for various object-level editing operations on real images such as reference image-based appearance editing, free-form shape editing, adding objects, and variations.
arXiv Detail & Related papers (2023-03-30T17:13:56Z)
SINE: SINgle Image Editing with Text-to-Image Diffusion Models [10.67527134198167]
This work aims to address the problem of single-image editing. We propose a novel model-based guidance built upon the classifier-free guidance. We show promising editing capabilities, including changing style, content addition, and object manipulation.
arXiv Detail & Related papers (2022-12-08T18:57:13Z)
Self-Guided Diffusion Models [53.825634944114285]
We propose a framework for self-guided diffusion models. Our method provides guidance signals at various image granularities. Our experiments on single-label and multi-label image datasets demonstrate that self-labeled guidance always outperforms diffusion models without guidance.
arXiv Detail & Related papers (2022-10-12T17:57:58Z)
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators [63.85888518950824]
We present a text-driven method that allows shifting a generative model to new domains. We show that through natural language prompts and a few minutes of training, our method can adapt a generator across a multitude of domains.
arXiv Detail & Related papers (2021-08-02T14:46:46Z)
Self-Supervised Representation Learning from Flow Equivariance [97.13056332559526]
We present a new self-supervised learning representation framework that can be directly deployed on a video stream of complex scenes. Our representations, learned from high-resolution raw video, can be readily used for downstream tasks on static images.
arXiv Detail & Related papers (2021-01-16T23:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.