Training-Free Location-Aware Text-to-Image Synthesis
- URL: http://arxiv.org/abs/2304.13427v1
- Date: Wed, 26 Apr 2023 10:25:15 GMT
- Title: Training-Free Location-Aware Text-to-Image Synthesis
- Authors: Jiafeng Mao, Xueting Wang
- Abstract summary: We analyze the generative mechanism of the stable diffusion model and propose a new interactive generation paradigm.
Our method outperforms state-of-the-art methods on both control capacity and image quality.
- Score: 8.503001932363704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current large-scale generative models have impressive efficiency in
generating high-quality images based on text prompts. However, they lack the
ability to precisely control the size and position of objects in the generated
image. In this study, we analyze the generative mechanism of the stable
diffusion model and propose a new interactive generation paradigm that allows
users to specify the position of generated objects without additional training.
Moreover, we propose an object detection-based evaluation metric to assess the
control capability of location aware generation task. Our experimental results
show that our method outperforms state-of-the-art methods on both control
capacity and image quality.
Related papers
- Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach [13.262064234892282]
Small object generation has been limited due to difficulties in aligning cross-modal attention maps between text and these objects.
Our approach offers a training-free method that significantly mitigates this alignment issue with local and global attention guidance.
Preliminary results demonstrate the effectiveness of our method, showing marked improvements in the fidelity and accuracy of small object generation compared to existing models.
arXiv Detail & Related papers (2024-11-03T12:38:23Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Generate Anything Anywhere in Any Scene [25.75076439397536]
We propose a controllable text-to-image diffusion model for personalized object generation.
Our approach demonstrates significant potential for various applications, such as those in art, entertainment, and advertising design.
arXiv Detail & Related papers (2023-06-29T17:55:14Z) - Localized Text-to-Image Generation for Free via Cross Attention Control [154.06530917754515]
We show that localized generation can be achieved by simply controlling cross attention maps during inference.
Our proposed cross attention control (CAC) provides new open-vocabulary localization abilities to standard text-to-image models.
arXiv Detail & Related papers (2023-06-26T12:15:06Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - On the Robustness of Quality Measures for GANs [136.18799984346248]
This work evaluates the robustness of quality measures of generative models such as Inception Score (IS) and Fr'echet Inception Distance (FID)
We show that such metrics can also be manipulated by additive pixel perturbations.
arXiv Detail & Related papers (2022-01-31T06:43:09Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z) - Style Intervention: How to Achieve Spatial Disentanglement with
Style-based Generators? [100.60938767993088]
We propose a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives.
We verify the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required.
arXiv Detail & Related papers (2020-11-19T07:37:31Z) - Controlling generative models with continuous factors of variations [1.7188280334580197]
We introduce a new method to find meaningful directions in the latent space of any generative model.
Our method does not require human annotations and is well suited for the search of directions encoding simple transformations of the generated image.
arXiv Detail & Related papers (2020-01-28T10:04:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.