Design Booster: A Text-Guided Diffusion Model for Image Translation with
Spatial Layout Preservation
- URL: http://arxiv.org/abs/2302.02284v1
- Date: Sun, 5 Feb 2023 02:47:13 GMT
- Title: Design Booster: A Text-Guided Diffusion Model for Image Translation with
Spatial Layout Preservation
- Authors: Shiqi Sun, Shancheng Fang, Qian He, Wei Liu
- Abstract summary: We propose a new approach for flexible image translation by learning a layout-aware image condition together with a text condition.
Our method co-encodes images and text into a new domain during the training phase.
Experimental comparisons of our method with state-of-the-art methods demonstrate our model performs best in both style image translation and semantic image translation.
- Score: 12.365230063278625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models are able to generate photorealistic images in arbitrary
scenes. However, when applying diffusion models to image translation, there
exists a trade-off between maintaining spatial structure and high-quality
content. Besides, existing methods are mainly based on test-time optimization
or fine-tuning model for each input image, which are extremely time-consuming
for practical applications. To address these issues, we propose a new approach
for flexible image translation by learning a layout-aware image condition
together with a text condition. Specifically, our method co-encodes images and
text into a new domain during the training phase. In the inference stage, we
can choose images/text or both as the conditions for each time step, which
gives users more flexible control over layout and content. Experimental
comparisons of our method with state-of-the-art methods demonstrate our model
performs best in both style image translation and semantic image translation
and took the shortest time.
Related papers
- Dense Text-to-Image Generation with Attention Modulation [49.287458275920514]
Existing text-to-image diffusion models struggle to synthesize realistic images given dense captions.
We propose DenseDiffusion, a training-free method that adapts a pre-trained text-to-image model to handle such dense captions.
We achieve similar-quality visual results with models specifically trained with layout conditions.
arXiv Detail & Related papers (2023-08-24T17:59:01Z) - LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image
Generation [121.45667242282721]
We propose a coarse-to-fine paradigm to achieve layout planning and image generation.
Our proposed method outperforms the state-of-the-art models in terms of photorealistic layout and image generation.
arXiv Detail & Related papers (2023-08-09T17:45:04Z) - Improving Diffusion-based Image Translation using Asymmetric Gradient
Guidance [51.188396199083336]
We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance.
Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models.
Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
arXiv Detail & Related papers (2023-06-07T12:56:56Z) - Towards Real-time Text-driven Image Manipulation with Unconditional
Diffusion Models [33.993466872389085]
We develop a novel algorithm that learns image manipulations 4.5-10 times faster and applies them 8 times faster.
Our approach can adapt the pretrained model to the user-specified image and text description on the fly just for 4 seconds.
arXiv Detail & Related papers (2023-04-10T01:21:56Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - Direct Inversion: Optimization-Free Text-Driven Real Image Editing with
Diffusion Models [0.0]
We propose an optimization-free and zero fine-tuning framework that applies complex and non-rigid edits to a single real image via a text prompt.
We prove our method's efficacy in producing high-quality, diverse, semantically coherent, and faithful real image edits.
arXiv Detail & Related papers (2022-11-15T01:07:38Z) - eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z) - Pretraining is All You Need for Image-to-Image Translation [59.43151345732397]
We propose to use pretraining to boost general image-to-image translation.
We show that the proposed pretraining-based image-to-image translation (PITI) is capable of synthesizing images of unprecedented realism and faithfulness.
arXiv Detail & Related papers (2022-05-25T17:58:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.