InstanceDiffusion: Instance-level Control for Image Generation
- URL: http://arxiv.org/abs/2402.03290v1
- Date: Mon, 5 Feb 2024 18:49:17 GMT
- Title: InstanceDiffusion: Instance-level Control for Image Generation
- Authors: Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar,
Ishan Misra
- Abstract summary: InstanceDiffusion adds precise instance-level control to text-to-image diffusion models.
We propose three major changes to text-to-image models that enable precise instance-level control.
- Score: 89.31908006870422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image diffusion models produce high quality images but do not offer
control over individual instances in the image. We introduce InstanceDiffusion
that adds precise instance-level control to text-to-image diffusion models.
InstanceDiffusion supports free-form language conditions per instance and
allows flexible ways to specify instance locations such as simple single
points, scribbles, bounding boxes or intricate instance segmentation masks, and
combinations thereof. We propose three major changes to text-to-image models
that enable precise instance-level control. Our UniFusion block enables
instance-level conditions for text-to-image models, the ScaleU block improves
image fidelity, and our Multi-instance Sampler improves generations for
multiple instances. InstanceDiffusion significantly surpasses specialized
state-of-the-art models for each location condition. Notably, on the COCO
dataset, we outperform previous state-of-the-art by 20.4% AP$_{50}^\text{box}$
for box inputs, and 25.4% IoU for mask inputs.
Related papers
- Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation [0.0]
Text-to-image (T2I) generative diffusion models have demonstrated outstanding performance in synthesizing diverse, high-quality visuals from text captions.
We propose ObjectDiffusion, a model that conditions T2I diffusion models on semantic and spatial grounding information.
arXiv Detail & Related papers (2025-01-15T22:55:26Z) - DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation [63.63429658282696]
We propose DynamicControl, which supports dynamic combinations of diverse control signals.
We show that DynamicControl is superior to existing methods in terms of controllability, generation quality and composability under various conditional controls.
arXiv Detail & Related papers (2024-12-04T11:54:57Z) - LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation [17.169772329737913]
We present LocRef-Diffusion, a tuning-free model capable of personalized customization of multiple instances' appearance and position within an image.
To enhance the precision of instance placement, we introduce a layout-net, which controls instance generation locations.
To improve the appearance fidelity to reference images, we employ an appearance-net that extracts instance appearance features.
arXiv Detail & Related papers (2024-11-22T08:44:39Z) - Generating Compositional Scenes via Text-to-image RGBA Instance Generation [82.63805151691024]
Text-to-image diffusion generative models can generate high quality images at the cost of tedious prompt engineering.
We propose a novel multi-stage generation paradigm that is designed for fine-grained control, flexibility and interactivity.
Our experiments show that our RGBA diffusion model is capable of generating diverse and high quality instances with precise control over object attributes.
arXiv Detail & Related papers (2024-11-16T23:44:14Z) - Free-Mask: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability [5.767984430681467]
We propose a framework textbfFree-Mask that combines a Diffusion Model for segmentation with advanced image editing capabilities.
Results show that textbfFree-Mask achieves new state-of-the-art results on previously unseen classes in the VOC 2012 benchmark.
arXiv Detail & Related papers (2024-11-04T05:39:01Z) - A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.
Our approach enables versatile capabilities via different inference-time sampling schemes.
Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z) - Self-Supervised Open-Ended Classification with Small Visual Language
Models [60.23212389067007]
We present Self-Context Adaptation (SeCAt), a self-supervised approach that unlocks few-shot abilities for open-ended classification with small visual language models.
By using models with approximately 1B parameters we outperform the few-shot abilities of much larger models, such as Frozen and FROMAGe.
arXiv Detail & Related papers (2023-09-30T21:41:21Z) - MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation [104.03166324080917]
We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation.
Our method is training-free and does not rely on any label supervision.
Experimental results on the challenging LVIS long-tailed and open-vocabulary benchmarks demonstrate that MosaicFusion can significantly improve the performance of existing instance segmentation models.
arXiv Detail & Related papers (2023-09-22T17:59:42Z) - Few-Shot Diffusion Models [15.828257653106537]
We present Few-Shot Diffusion Models (FSDM), a framework for few-shot generation leveraging conditional DDPMs.
FSDM is trained to adapt the generative process conditioned on a small set of images from a given class by aggregating image patch information.
We empirically show that FSDM can perform few-shot generation and transfer to new datasets.
arXiv Detail & Related papers (2022-05-30T23:20:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.