Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model
- URL: http://arxiv.org/abs/2409.16689v1
- Date: Wed, 25 Sep 2024 07:24:43 GMT
- Title: Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model
- Authors: Shoma Iwai, Atsuki Osanai, Shunsuke Kitada, Shinichiro Omachi,
- Abstract summary: We present a learning-based module capable of identifying inharmonious elements within layouts, considering overall layout harmony.
The module consistently boosts layout-generation performance when in conjunction with various state-of-the-art DDMs.
- Score: 3.8748565070264753
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Layout generation is a task to synthesize a harmonious layout with elements characterized by attributes such as category, position, and size. Human designers experiment with the placement and modification of elements to create aesthetic layouts, however, we observed that current discrete diffusion models (DDMs) struggle to correct inharmonious layouts after they have been generated. In this paper, we first provide novel insights into layout sticking phenomenon in DDMs and then propose a simple yet effective layout-assessment module Layout-Corrector, which works in conjunction with existing DDMs to address the layout sticking problem. We present a learning-based module capable of identifying inharmonious elements within layouts, considering overall layout harmony characterized by complex composition. During the generation process, Layout-Corrector evaluates the correctness of each token in the generated layout, reinitializing those with low scores to the ungenerated state. The DDM then uses the high-scored tokens as clues to regenerate the harmonized tokens. Layout-Corrector, tested on common benchmarks, consistently boosts layout-generation performance when in conjunction with various state-of-the-art DDMs. Furthermore, our extensive analysis demonstrates that the Layout-Corrector (1) successfully identifies erroneous tokens, (2) facilitates control over the fidelity-diversity trade-off, and (3) significantly mitigates the performance drop associated with fast sampling.
Related papers
- Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints [53.66698106829144]
We propose a unified model to handle a broad range of layout generation tasks.
The model is based on continuous diffusion models.
Experiment results show that LACE produces high-quality layouts.
arXiv Detail & Related papers (2024-02-07T11:12:41Z) - Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive [21.49096276631859]
Current L2I models either suffer from poor editability via text or weak alignment between the generated image and the input layout.
We propose to integrate adversarial supervision into the conventional training pipeline of L2I diffusion models (ALDM)
Specifically, we employ a segmentation-based discriminator which provides explicit feedback to the diffusion generator on the pixel-level alignment between the denoised image and the input layout.
arXiv Detail & Related papers (2024-01-16T20:31:46Z) - Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation [23.033381812631443]
We present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time.
Specifically, following a "check-locate-rectify" pipeline, the system first analyses the prompt to generate the target layout and compares it with the intermediate outputs to automatically detect errors.
Then, by moving the located activations and making intra- and inter-map adjustments, the rectification process can be performed with negligible computational overhead.
arXiv Detail & Related papers (2023-11-27T12:48:33Z) - LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language
Models [84.16541551923221]
We propose a model that treats layout generation as a code generation task to enhance semantic information.
We develop a Code Instruct Tuning (CIT) approach comprising three interconnected modules.
We attain significant state-of-the-art performance on multiple datasets.
arXiv Detail & Related papers (2023-09-18T06:35:10Z) - Are Layout-Infused Language Models Robust to Layout Distribution Shifts?
A Case Study with Scientific Documents [54.744701806413204]
Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers.
We test whether layout-infused LMs are robust to layout distribution shifts.
arXiv Detail & Related papers (2023-06-01T18:01:33Z) - PosterLayout: A New Benchmark and Approach for Content-aware
Visual-Textual Presentation Layout [62.12447593298437]
Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements.
We propose design sequence formation (DSF) that reorganizes elements in layouts to imitate the design processes of human designers.
A novel CNN-LSTM-based conditional generative adversarial network (GAN) is presented to generate proper layouts.
arXiv Detail & Related papers (2023-03-28T12:48:36Z) - LayoutDiffusion: Improving Graphic Layout Generation by Discrete
Diffusion Probabilistic Models [50.73105631853759]
We present a novel generative model named LayoutDiffusion for automatic layout generation.
It learns to reverse a mild forward process, in which layouts become increasingly chaotic with the growth of forward steps.
It enables two conditional layout generation tasks in a plug-and-play manner without re-training and achieves better performance than existing methods.
arXiv Detail & Related papers (2023-03-21T04:41:02Z) - LayoutDM: Discrete Diffusion Model for Controllable Layout Generation [27.955214767628107]
Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints.
In this work, we try to solve a broad range of layout generation tasks in a single model that is based on discrete state-space diffusion models.
Our model, named LayoutDM, naturally handles the structured layout data in the discrete representation and learns to progressively infer a noiseless layout from the initial input.
arXiv Detail & Related papers (2023-03-14T17:59:47Z) - Unifying Layout Generation with a Decoupled Diffusion Model [26.659337441975143]
It is a crucial task for reducing the burden on heavy-duty graphic design works for formatted scenes, e.g., publications, documents, and user interfaces (UIs)
We propose a layout Diffusion Generative Model (LDGM) to achieve such unification with a single decoupled diffusion model.
Our proposed LDGM can generate layouts either from scratch or conditional on arbitrary available attributes.
arXiv Detail & Related papers (2023-03-09T05:53:32Z) - Layout-to-Image Translation with Double Pooling Generative Adversarial
Networks [76.83075646527521]
We propose a novel Double Pooing GAN (DPGAN) for generating photo-realistic and semantically-consistent results from the input layout.
We also propose a novel Double Pooling Module (DPM), which consists of the Square-shape Pooling Module (SPM) and the Rectangle-shape Pooling Module ( RPM)
arXiv Detail & Related papers (2021-08-29T19:55:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.