Related papers: Relation-Aware Diffusion Model for Controllable Poster Layout Generation

Relation-Aware Diffusion Model for Controllable Poster Layout Generation

URL: http://arxiv.org/abs/2306.09086v2
Date: Thu, 11 Jan 2024 08:46:37 GMT
Title: Relation-Aware Diffusion Model for Controllable Poster Layout Generation
Authors: Fengheng Li, An Liu, Wei Feng, Honghe Zhu, Yaoyu Li, Zheng Zhang, Jingjing Lv, Xin Zhu, Junjie Shen, Zhangang Lin, Jingping Shao
Abstract summary: Poster layout is a crucial aspect of poster design. In this study, we introduce a relation-aware diffusion model for poster layout generation. The proposed method can generate diverse layouts based on user constraints.
Score: 19.65249380159006
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Poster layout is a crucial aspect of poster design. Prior methods primarily focus on the correlation between visual content and graphic elements. However, a pleasant layout should also consider the relationship between visual and textual contents and the relationship between elements. In this study, we introduce a relation-aware diffusion model for poster layout generation that incorporates these two relationships in the generation process. Firstly, we devise a visual-textual relation-aware module that aligns the visual and textual representations across modalities, thereby enhancing the layout's efficacy in conveying textual information. Subsequently, we propose a geometry relation-aware module that learns the geometry relationship between elements by comprehensively considering contextual information. Additionally, the proposed method can generate diverse layouts based on user constraints. To advance research in this field, we have constructed a poster layout dataset named CGL-Dataset V2. Our proposed method outperforms state-of-the-art methods on CGL-Dataset V2. The data and code will be available at https://github.com/liuan0803/RADM.

Related papers

ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models [7.288330685534444]
We introduce Re, a novel method that leverages relation-CoT to generate more reasonable and coherent layouts.<n>Specifically, we enhance layout annotations by introducing explicit relation definitions, such as region, salient, and margin between elements.<n>We also introduce a layout prototype sampler, which defines layout prototype features across three dimensions and quantifies distinct layout styles.
arXiv Detail & Related papers (2025-07-08T01:13:43Z)
GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a VLM-based framework that generates content-aware text logo layouts. We introduce two model techniques to reduce the computation for processing multiple glyph images simultaneously. To support instruction-tuning of out model, we construct two extensive text logo datasets, which are 5x more larger than the existing public dataset.
arXiv Detail & Related papers (2024-11-18T10:04:10Z)
Relation Rectification in Diffusion Model [64.84686527988809]
We introduce a novel task termed Relation Rectification, aiming to refine the model to accurately represent a given relationship it initially fails to generate. We propose an innovative solution utilizing a Heterogeneous Graph Convolutional Network (HGCN) The lightweight HGCN adjusts the text embeddings generated by the text encoder, ensuring the accurate reflection of the textual relation in the embedding space.
arXiv Detail & Related papers (2024-03-29T15:54:36Z)
LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized Recommendations [26.822169338351827]
We develop a framework to incorporate graph edge information from the prompt and attention mechanisms for graph-structured LLM recommendations.<n>Our evaluation of real-world datasets demonstrates the framework's ability to understand connectivity information in graph data.
arXiv Detail & Related papers (2024-02-14T23:12:09Z)
LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models [84.16541551923221]
We propose a model that treats layout generation as a code generation task to enhance semantic information. We develop a Code Instruct Tuning (CIT) approach comprising three interconnected modules. We attain significant state-of-the-art performance on multiple datasets.
arXiv Detail & Related papers (2023-09-18T06:35:10Z)
A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions [50.469491454128246]
We use text as the guidance to create graphic layouts, i.e., Text-to-labeled, aiming to lower the design barriers. Text-to-labeled is a challenging task, because it needs to consider the implicit, combined, and incomplete constraints from text. We present a two-stage approach, named parse-then-place, to address this problem.
arXiv Detail & Related papers (2023-08-24T10:37:00Z)
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model. We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z)
PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout [62.12447593298437]
Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements. We propose design sequence formation (DSF) that reorganizes elements in layouts to imitate the design processes of human designers. A novel CNN-LSTM-based conditional generative adversarial network (GAN) is presented to generate proper layouts.
arXiv Detail & Related papers (2023-03-28T12:48:36Z)
Geometry Aligned Variational Transformer for Image-conditioned Layout Generation [38.747175229902396]
We propose an Image-Conditioned Variational Transformer (ICVT) that autoregressively generates various layouts in an image. First, self-attention mechanism is adopted to model the contextual relationship within layout elements, while cross-attention mechanism is used to fuse the visual information of conditional images. We construct a large-scale advertisement poster layout designing dataset with delicate layout and saliency map annotations.
arXiv Detail & Related papers (2022-09-02T07:19:12Z)
VSR: A Unified Framework for Document Layout Analysis combining Vision, Semantics and Relations [40.721146438291335]
We propose a unified framework VSR for document layout analysis, combining vision, semantics and relations. On three popular benchmarks, VSR outperforms previous models by large margins.
arXiv Detail & Related papers (2021-05-13T12:20:30Z)
LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding [17.179384053140236]
Document layout comprises both structural and visual (eg. font-sizes) information that is vital but often ignored by machine learning models. We propose a novel layout-aware multimodal hierarchical framework, LAMPreT, to model the blocks and the whole document. We evaluate the proposed model on two layout-aware tasks -- text block filling and image suggestion.
arXiv Detail & Related papers (2021-04-16T23:27:39Z)
Relational Message Passing for Knowledge Graph Completion [78.47976646383222]
We propose a relational message passing method for knowledge graph completion. It passes relational messages among edges iteratively to aggregate neighborhood information. Results show our method outperforms stateof-the-art knowledge completion methods by a large margin.
arXiv Detail & Related papers (2020-02-17T03:33:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.