TextLap: Customizing Language Models for Text-to-Layout Planning
- URL: http://arxiv.org/abs/2410.12844v1
- Date: Wed, 09 Oct 2024 19:51:38 GMT
- Title: TextLap: Customizing Language Models for Text-to-Layout Planning
- Authors: Jian Chen, Ruiyi Zhang, Yufan Zhou, Jennifer Healey, Jiuxiang Gu, Zhiqiang Xu, Changyou Chen,
- Abstract summary: We call our method TextLap (text-based layout planning)
It uses a curated instruction-based layout planning dataset (InsLap) to customize Large language models (LLMs) as a graphic designer.
We demonstrate the effectiveness of TextLap and show that it outperforms strong baselines, including GPT-4 based methods, for image generation and graphical design benchmarks.
- Score: 65.02105936609021
- License:
- Abstract: Automatic generation of graphical layouts is crucial for many real-world applications, including designing posters, flyers, advertisements, and graphical user interfaces. Given the incredible ability of Large language models (LLMs) in both natural language understanding and generation, we believe that we could customize an LLM to help people create compelling graphical layouts starting with only text instructions from the user. We call our method TextLap (text-based layout planning). It uses a curated instruction-based layout planning dataset (InsLap) to customize LLMs as a graphic designer. We demonstrate the effectiveness of TextLap and show that it outperforms strong baselines, including GPT-4 based methods, for image generation and graphical design benchmarks.
Related papers
- AI Based Font Pair Suggestion Modelling For Graphic Design [0.0]
One of the key challenges of AI generated designs in Microsoft Designer is selecting the most contextually relevant and novel fonts for the design suggestions.
In this work we create font visual embeddings, a font stroke width algorithm, a font category to font mapping dataset and a lightweight, low latency knowledge-distilled mini language model (Mini LM V2) to recommend multiple pairs of contextual heading and subheading fonts.
arXiv Detail & Related papers (2025-01-19T07:08:36Z) - Personalized Graph-Based Retrieval for Large Language Models [51.7278897841697]
We propose a framework that leverages user-centric knowledge graphs to enrich personalization.
By directly integrating structured user knowledge into the retrieval process and augmenting prompts with user-relevant context, PGraph enhances contextual understanding and output quality.
We also introduce the Personalized Graph-based Benchmark for Text Generation, designed to evaluate personalized text generation tasks in real-world settings where user history is sparse or unavailable.
arXiv Detail & Related papers (2025-01-04T01:46:49Z) - HouseLLM: LLM-Assisted Two-Phase Text-to-Floorplan Generation [4.242755827806053]
This paper proposes a two-phase text-to-floorplan generation method, which guides a Large Language Model (LLM) to generate an initial layout.
We incorporate a Chain-of-Thought approach to prompt the LLM based on user text specifications, enabling a more user-friendly and intuitive house layout design.
Experimental results demonstrate that our approach achieves state-of-the-art performance across all metrics, validating its effectiveness in practical home design applications.
arXiv Detail & Related papers (2024-11-19T06:57:45Z) - GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a VLM-based framework that generates content-aware text logo layouts.
We introduce two model techniques to reduce the computation for processing multiple glyph images simultaneously.
To support instruction-tuning of out model, we construct two extensive text logo datasets, which are 5x more larger than the existing public dataset.
arXiv Detail & Related papers (2024-11-18T10:04:10Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.
Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.
We develop an automated text-to-poster system that generates editable posters based on users' design intentions.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - LayoutGPT: Compositional Visual Planning and Generation with Large
Language Models [98.81962282674151]
Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions.
We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language.
arXiv Detail & Related papers (2023-05-24T17:56:16Z) - A Picture is Worth a Thousand Words: Language Models Plan from Pixels [53.85753597586226]
Planning is an important capability of artificial agents that perform long-horizon tasks in real-world environments.
In this work, we explore the use of pre-trained language models (PLMs) to reason about plan sequences from text instructions in embodied visual environments.
arXiv Detail & Related papers (2023-03-16T02:02:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.