Related papers: Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis

Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis

URL: http://arxiv.org/abs/2406.04032v1
Date: Thu, 6 Jun 2024 13:02:00 GMT
Title: Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
Authors: Marianna Ohanyan, Hayk Manukyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi,
Abstract summary: We present Zero-Painter, a framework for layout-conditional text-to-image synthesis. Our method utilizes object masks and individual descriptions, coupled with a global text prompt, to generate images with high fidelity.
Score: 63.757624792753205
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Zero-Painter, a novel training-free framework for layout-conditional text-to-image synthesis that facilitates the creation of detailed and controlled imagery from textual prompts. Our method utilizes object masks and individual descriptions, coupled with a global text prompt, to generate images with high fidelity. Zero-Painter employs a two-stage process involving our novel Prompt-Adjusted Cross-Attention (PACA) and Region-Grouped Cross-Attention (ReGCA) blocks, ensuring precise alignment of generated objects with textual prompts and mask shapes. Our extensive experiments demonstrate that Zero-Painter surpasses current state-of-the-art methods in preserving textual details and adhering to mask shapes.

Related papers

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator [44.620847608977776]
Diptych Prompting is a novel zero-shot approach that reinterprets as an inpainting task with precise subject alignment. Our method supports not only subject-driven generation but also stylized image generation and subject-driven image editing.
arXiv Detail & Related papers (2024-11-23T06:17:43Z)
DiffSTR: Controlled Diffusion Models for Scene Text Removal [5.790630195329777]
Scene Text Removal (STR) aims to prevent unauthorized use of text in images. STR faces several challenges, including boundary artifacts, inconsistent texture and color, and preserving correct shadows. We introduce a ControlNet diffusion model, treating STR as an inpainting task. We develop a mask pretraining pipeline to condition our diffusion model.
arXiv Detail & Related papers (2024-10-29T04:20:21Z)
First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending [5.3798706094384725]
We propose a new visual text blending paradigm including both creating backgrounds and rendering texts. Specifically, a background generator is developed to produce high-fidelity and text-free natural images. We also explore several downstream applications based on our method, including scene text dataset synthesis for boosting scene text detectors.
arXiv Detail & Related papers (2024-10-14T05:23:43Z)
Lazy Diffusion Transformer for Interactive Image Editing [79.75128130739598]
We introduce a novel diffusion transformer, LazyDiffusion, that generates partial image updates efficiently. Our approach targets interactive image editing applications in which, starting from a blank canvas or an image, a user specifies a sequence of localized image modifications.
arXiv Detail & Related papers (2024-04-18T17:59:27Z)
Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model [31.819060415422353]
Diff-Text is a training-free scene text generation framework for any language. Our method outperforms the existing method in both the accuracy of text recognition and the naturalness of foreground-background blending.
arXiv Detail & Related papers (2023-12-19T15:18:40Z)
Text-Driven Image Editing via Learnable Regions [74.45313434129005]
We introduce a method for region-based image editing driven by textual prompts, without the need for user-provided masks or sketches. We show that this simple approach enables flexible editing that is compatible with current image generation models. Experiments demonstrate the competitive performance of our method in manipulating images with high fidelity and realism that correspond to the provided language descriptions.
arXiv Detail & Related papers (2023-11-28T02:27:31Z)
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation [121.45667242282721]
We propose a coarse-to-fine paradigm to achieve layout planning and image generation. Our proposed method outperforms the state-of-the-art models in terms of photorealistic layout and image generation.
arXiv Detail & Related papers (2023-08-09T17:45:04Z)
SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model [27.91089554671927]
Generic image inpainting aims to complete a corrupted image by borrowing surrounding information. By contrast, multi-modal inpainting provides more flexible and useful controls on the inpainted content. We propose a new diffusion-based model named SmartBrush for completing a missing region with an object using both text and shape-guidance.
arXiv Detail & Related papers (2022-12-09T18:36:13Z)
SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control. In addition to a global text prompt that describes the entire scene, the user provides a segmentation map. We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z)
SceneComposer: Any-Level Semantic Image Synthesis [80.55876413285587]
We propose a new framework for conditional image synthesis from semantic layouts of any precision levels. The framework naturally reduces to text-to-image (T2I) at the lowest level with no shape information, and it becomes segmentation-to-image (S2I) at the highest level. We introduce several novel techniques to address the challenges coming with this new setup.
arXiv Detail & Related papers (2022-11-21T18:59:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.