Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
- URL: http://arxiv.org/abs/2311.15773v3
- Date: Mon, 25 Mar 2024 17:41:23 GMT
- Title: Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
- Authors: Biao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu,
- Abstract summary: We present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time.
Specifically, following a "check-locate-rectify" pipeline, the system first analyses the prompt to generate the target layout and compares it with the intermediate outputs to automatically detect errors.
Then, by moving the located activations and making intra- and inter-map adjustments, the rectification process can be performed with negligible computational overhead.
- Score: 23.033381812631443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have recently achieved remarkable progress in generating realistic images. However, challenges remain in accurately understanding and synthesizing the layout requirements in the textual prompts. To align the generated image with layout instructions, we present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time. Specifically, following a "check-locate-rectify" pipeline, the system first analyses the prompt to generate the target layout and compares it with the intermediate outputs to automatically detect errors. Then, by moving the located activations and making intra- and inter-map adjustments, the rectification process can be performed with negligible computational overhead. To evaluate SimM over a range of layout requirements, we present a benchmark SimMBench that compensates for the lack of superlative spatial relations in existing datasets. And both quantitative and qualitative results demonstrate the effectiveness of the proposed SimM in calibrating the layout inconsistencies. Our project page is at https://simm-t2i.github.io/SimM.
Related papers
- Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model [3.8748565070264753]
We present a learning-based module capable of identifying inharmonious elements within layouts, considering overall layout harmony.
The module consistently boosts layout-generation performance when in conjunction with various state-of-the-art DDMs.
arXiv Detail & Related papers (2024-09-25T07:24:43Z) - Iterative Sizing Field Prediction for Adaptive Mesh Generation From Expert Demonstrations [49.173541207550485]
Adaptive Meshing By Expert Reconstruction (AMBER) is an imitation learning problem.
AMBER combines a graph neural network with an online data acquisition scheme to predict the projected sizing field of an expert mesh.
We experimentally validate AMBER on 2D meshes and 3D meshes provided by a human expert, closely matching the provided demonstrations and outperforming a single-step CNN baseline.
arXiv Detail & Related papers (2024-06-20T10:01:22Z) - Bayesian Adaptive Calibration and Optimal Design [16.821341360894706]
Current machine learning approaches mostly rely on rerunning simulations over a fixed set of designs available in the observed data.
We propose a data-efficient algorithm to run maximally informative simulations within a batch-sequential process.
We show the benefits of our method when compared to related approaches across synthetic and real-data problems.
arXiv Detail & Related papers (2024-05-23T11:14:35Z) - Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching [9.796880796900242]
Trajectory Score Matching (TSM) aims to solve the pseudo ground truth inconsistency problem caused by the accumulated error in Interval Score Matching (ISM)
Our TSM method leverages the inversion process of DDIM to generate two paths from the same starting point for calculation.
To optimize the current multi-stage optimization process from high-resolution text to 3D generation, we adopt Stable Diffusion XL for guidance.
arXiv Detail & Related papers (2024-05-18T10:41:57Z) - Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval [92.13664084464514]
The task of composed image retrieval (CIR) aims to retrieve images based on the query image and the text describing the users' intent.
Existing methods have made great progress with the advanced large vision-language (VL) model in CIR task, however, they generally suffer from two main issues: lack of labeled triplets for model training and difficulty of deployment on resource-restricted environments.
We propose Image2Sentence based Asymmetric zero-shot composed image retrieval (ISA), which takes advantage of the VL model and only relies on unlabeled images for composition learning.
arXiv Detail & Related papers (2024-03-03T07:58:03Z) - Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive [21.49096276631859]
Current L2I models either suffer from poor editability via text or weak alignment between the generated image and the input layout.
We propose to integrate adversarial supervision into the conventional training pipeline of L2I diffusion models (ALDM)
Specifically, we employ a segmentation-based discriminator which provides explicit feedback to the diffusion generator on the pixel-level alignment between the denoised image and the input layout.
arXiv Detail & Related papers (2024-01-16T20:31:46Z) - LayoutDiffusion: Improving Graphic Layout Generation by Discrete
Diffusion Probabilistic Models [50.73105631853759]
We present a novel generative model named LayoutDiffusion for automatic layout generation.
It learns to reverse a mild forward process, in which layouts become increasingly chaotic with the growth of forward steps.
It enables two conditional layout generation tasks in a plug-and-play manner without re-training and achieves better performance than existing methods.
arXiv Detail & Related papers (2023-03-21T04:41:02Z) - Read Pointer Meters in complex environments based on a Human-like
Alignment and Recognition Algorithm [16.823681016882315]
We propose a human-like alignment and recognition algorithm to overcome these problems.
A Spatial Transformed Module(STM) is proposed to obtain the front view of images in a self-autonomous way.
A Value Acquisition Module(VAM) is proposed to infer accurate meter values by an end-to-end trained framework.
arXiv Detail & Related papers (2023-02-28T05:37:04Z) - Overlap-guided Gaussian Mixture Models for Point Cloud Registration [61.250516170418784]
Probabilistic 3D point cloud registration methods have shown competitive performance in overcoming noise, outliers, and density variations.
This paper proposes a novel overlap-guided probabilistic registration approach that computes the optimal transformation from matched Gaussian Mixture Model (GMM) parameters.
arXiv Detail & Related papers (2022-10-17T08:02:33Z) - Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment.
In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation.
We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.