Smaller But Better: Unifying Layout Generation with Smaller Large Language Models
- URL: http://arxiv.org/abs/2502.14005v1
- Date: Wed, 19 Feb 2025 10:06:42 GMT
- Title: Smaller But Better: Unifying Layout Generation with Smaller Large Language Models
- Authors: Peirong Zhang, Jiaxin Zhang, Jiahuan Cao, Hongliang Li, Lianwen Jin,
- Abstract summary: LGGPT is an LLM-based model tailored for unified layout generation.<n>ALI and ULR boast a succinct structure that forgoes superfluous tokens.<n>We show that LGGPT achieves superior or on par performance compared to existing methods.
- Score: 33.18391143075568
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose LGGPT, an LLM-based model tailored for unified layout generation. First, we propose Arbitrary Layout Instruction (ALI) and Universal Layout Response (ULR) as the uniform I/O template. ALI accommodates arbitrary layout generation task inputs across multiple layout domains, enabling LGGPT to unify both task-generic and domain-generic layout generation hitherto unexplored. Collectively, ALI and ULR boast a succinct structure that forgoes superfluous tokens typically found in existing HTML-based formats, facilitating efficient instruction tuning and boosting unified generation performance. In addition, we propose an Interval Quantization Encoding (IQE) strategy that compresses ALI into a more condensed structure. IQE precisely preserves valid layout clues while eliminating the less informative placeholders, facilitating LGGPT to capture complex and variable layout generation conditions during the unified training process. Experimental results demonstrate that LGGPT achieves superior or on par performance compared to existing methods. Notably, LGGPT strikes a prominent balance between proficiency and efficiency with a compact 1.5B parameter LLM, which beats prior 7B or 175B models even in the most extensive and challenging unified scenario. Furthermore, we underscore the necessity of employing LLMs for unified layout generation and suggest that 1.5B could be an optimal parameter size by comparing LLMs of varying scales. Code is available at https://github.com/NiceRingNode/LGGPT.
Related papers
- LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation [12.616508576956136]
Conditional layout generation aims to automatically generate visually appealing and semantically coherent layouts from user-defined constraints.
We propose a novel approach that leverages the reasoning capabilities of Large Language Models (LLMs) through a combination of Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT) techniques.
We conduct extensive experiments on five public datasets spanning three conditional layout generation tasks.
arXiv Detail & Related papers (2025-04-15T03:12:01Z) - Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing [53.295515505026096]
Janus-Pro-driven Prompt Parsing is a prompt- parsing module that bridges text understanding and layout generation.
MIGLoRA is a parameter-efficient plug-in integrating Low-Rank Adaptation into UNet (SD1.5) and DiT (SD3) backbones.
The proposed method achieves state-of-the-art performance on COCO and LVIS benchmarks while maintaining parameter efficiency.
arXiv Detail & Related papers (2025-03-27T00:59:14Z) - HouseLLM: LLM-Assisted Two-Phase Text-to-Floorplan Generation [4.242755827806053]
This paper proposes a two-phase text-to-floorplan generation method, which guides a Large Language Model (LLM) to generate an initial layout.<n>We incorporate a Chain-of-Thought approach to prompt the LLM based on user text specifications, enabling a more user-friendly and intuitive house layout design.<n> Experimental results demonstrate that our approach achieves state-of-the-art performance across all metrics, validating its effectiveness in practical home design applications.
arXiv Detail & Related papers (2024-11-19T06:57:45Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models.
We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses.
We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z) - Adaptable Logical Control for Large Language Models [68.27725600175013]
Ctrl-G is an adaptable framework that facilitates tractable and flexible control of model generation at inference time.
We show that Ctrl-G, when applied to a TULU2-7B model, outperforms GPT3.5 and GPT4 on the task of interactive text editing.
arXiv Detail & Related papers (2024-06-19T23:47:59Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.<n>Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.<n>We develop an automated text-to-poster system that generates editable posters based on users' design intentions.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)<n>We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - A Framework to Implement 1+N Multi-task Fine-tuning Pattern in LLMs
Using the CGC-LORA Algorithm [7.521690071464451]
We propose a unified framework that implements a 1 + N mutli-task fine-tuning pattern in large language models (LLMs)
Our work aims to take an advantage of both MTL (i.e., CGC) and PEFT (i.e., LoRA) scheme.
arXiv Detail & Related papers (2024-01-22T07:58:31Z) - Unifying Layout Generation with a Decoupled Diffusion Model [26.659337441975143]
It is a crucial task for reducing the burden on heavy-duty graphic design works for formatted scenes, e.g., publications, documents, and user interfaces (UIs)
We propose a layout Diffusion Generative Model (LDGM) to achieve such unification with a single decoupled diffusion model.
Our proposed LDGM can generate layouts either from scratch or conditional on arbitrary available attributes.
arXiv Detail & Related papers (2023-03-09T05:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.