TextLap: Customizing Language Models for Text-to-Layout Planning
- URL: http://arxiv.org/abs/2410.12844v1
- Date: Wed, 09 Oct 2024 19:51:38 GMT
- Title: TextLap: Customizing Language Models for Text-to-Layout Planning
- Authors: Jian Chen, Ruiyi Zhang, Yufan Zhou, Jennifer Healey, Jiuxiang Gu, Zhiqiang Xu, Changyou Chen,
- Abstract summary: We call our method TextLap (text-based layout planning)
It uses a curated instruction-based layout planning dataset (InsLap) to customize Large language models (LLMs) as a graphic designer.
We demonstrate the effectiveness of TextLap and show that it outperforms strong baselines, including GPT-4 based methods, for image generation and graphical design benchmarks.
- Score: 65.02105936609021
- License:
- Abstract: Automatic generation of graphical layouts is crucial for many real-world applications, including designing posters, flyers, advertisements, and graphical user interfaces. Given the incredible ability of Large language models (LLMs) in both natural language understanding and generation, we believe that we could customize an LLM to help people create compelling graphical layouts starting with only text instructions from the user. We call our method TextLap (text-based layout planning). It uses a curated instruction-based layout planning dataset (InsLap) to customize LLMs as a graphic designer. We demonstrate the effectiveness of TextLap and show that it outperforms strong baselines, including GPT-4 based methods, for image generation and graphical design benchmarks.
Related papers
- HouseLLM: LLM-Assisted Two-Phase Text-to-Floorplan Generation [4.242755827806053]
This paper proposes a two-phase text-to-floorplan generation method, which guides a Large Language Model (LLM) to generate an initial layout.
We incorporate a Chain-of-Thought approach to prompt the LLM based on user text specifications, enabling a more user-friendly and intuitive house layout design.
Experimental results demonstrate that our approach achieves state-of-the-art performance across all metrics, validating its effectiveness in practical home design applications.
arXiv Detail & Related papers (2024-11-19T06:57:45Z) - GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a VLM-based framework that generates content-aware text logo layouts.
We introduce two model techniques to reduce the computation for processing multiple glyph images simultaneously.
To support instruction-tuning of out model, we construct two extensive text logo datasets, which are 5x more larger than the existing public dataset.
arXiv Detail & Related papers (2024-11-18T10:04:10Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.
Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.
We conduct extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - Large Language Models on Graphs: A Comprehensive Survey [77.16803297418201]
We provide a systematic review of scenarios and techniques related to large language models on graphs.
We first summarize potential scenarios of adopting LLMs on graphs into three categories, namely pure graphs, text-attributed graphs, and text-paired graphs.
We discuss the real-world applications of such methods and summarize open-source codes and benchmark datasets.
arXiv Detail & Related papers (2023-12-05T14:14:27Z) - Painter: Teaching Auto-regressive Language Models to Draw Sketches [5.3445140425713245]
We present Painter, an LLM that can convert user prompts in text description format to sketches.
We create a dataset of diverse multi-object sketches paired with textual prompts.
Although this is an unprecedented pioneering work in using LLMs for auto-regressive image generation, the results are very encouraging.
arXiv Detail & Related papers (2023-08-16T17:18:30Z) - LayoutGPT: Compositional Visual Planning and Generation with Large
Language Models [98.81962282674151]
Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions.
We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language.
arXiv Detail & Related papers (2023-05-24T17:56:16Z) - A Picture is Worth a Thousand Words: Language Models Plan from Pixels [53.85753597586226]
Planning is an important capability of artificial agents that perform long-horizon tasks in real-world environments.
In this work, we explore the use of pre-trained language models (PLMs) to reason about plan sequences from text instructions in embodied visual environments.
arXiv Detail & Related papers (2023-03-16T02:02:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.