Localizing Object-level Shape Variations with Text-to-Image Diffusion
Models
- URL: http://arxiv.org/abs/2303.11306v2
- Date: Sun, 13 Aug 2023 03:50:42 GMT
- Title: Localizing Object-level Shape Variations with Text-to-Image Diffusion
Models
- Authors: Or Patashnik, Daniel Garibi, Idan Azuri, Hadar Averbuch-Elor, Daniel
Cohen-Or
- Abstract summary: We present a technique to generate a collection of images that depicts variations in the shape of a specific object.
A particular challenge when generating object variations is accurately localizing the manipulation applied over the object's shape.
To localize the image-space operation, we present two techniques that use the self-attention layers in conjunction with the cross-attention layers.
- Score: 60.422435066544814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image models give rise to workflows which often begin with an
exploration step, where users sift through a large collection of generated
images. The global nature of the text-to-image generation process prevents
users from narrowing their exploration to a particular object in the image. In
this paper, we present a technique to generate a collection of images that
depicts variations in the shape of a specific object, enabling an object-level
shape exploration process. Creating plausible variations is challenging as it
requires control over the shape of the generated object while respecting its
semantics. A particular challenge when generating object variations is
accurately localizing the manipulation applied over the object's shape. We
introduce a prompt-mixing technique that switches between prompts along the
denoising process to attain a variety of shape choices. To localize the
image-space operation, we present two techniques that use the self-attention
layers in conjunction with the cross-attention layers. Moreover, we show that
these localization techniques are general and effective beyond the scope of
generating object variations. Extensive results and comparisons demonstrate the
effectiveness of our method in generating object variations, and the competence
of our localization techniques.
Related papers
- SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects [20.978091381109294]
We propose a method to generate articulated objects from a single image.
Our method generates an articulated object that is visually consistent with the input image.
Our experiments show that our method outperforms the state-of-the-art in articulated object creation.
arXiv Detail & Related papers (2024-10-21T20:41:32Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Salient Object-Aware Background Generation using Text-Guided Diffusion Models [4.747826159446815]
We present a model for adapting inpainting diffusion models to the salient object outpainting task using Stable Diffusion and ControlNet architectures.
Our proposed approach reduces object expansion by 3.6x on average with no degradation in standard visual metrics across multiple datasets.
arXiv Detail & Related papers (2024-04-15T22:13:35Z) - Towards Understanding Cross and Self-Attention in Stable Diffusion for
Text-Guided Image Editing [47.71851180196975]
tuning-free Text-guided Image Editing (TIE) is of greater importance for application developers.
We conduct an in-depth probing analysis and demonstrate that cross-attention maps in Stable Diffusion often contain object attribution information.
In contrast, self-attention maps play a crucial role in preserving the geometric and shape details of the source image.
arXiv Detail & Related papers (2024-03-06T03:32:56Z) - Leveraging Open-Vocabulary Diffusion to Camouflaged Instance
Segmentation [59.78520153338878]
Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions.
We propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations.
arXiv Detail & Related papers (2023-12-29T07:59:07Z) - Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images.
We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process.
Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations.
We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings.
We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.