COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design
- URL: http://arxiv.org/abs/2311.16974v2
- Date: Mon, 18 Mar 2024 21:43:20 GMT
- Title: COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design
- Authors: Peidong Jia, Chenxuan Li, Yuhui Yuan, Zeyu Liu, Yichao Shen, Bohan Chen, Xingru Chen, Yinglin Zheng, Dong Chen, Ji Li, Xiaodong Xie, Shanghang Zhang, Baining Guo,
- Abstract summary: This paper introduces the COLE system - a hierarchical generation framework designed to address these challenges.
This COLE system can transform a vague intention prompt into a high-quality multi-layered graphic design, while also supporting flexible editing based on user input.
- Score: 39.809852329070466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graphic design, which has been evolving since the 15th century, plays a crucial role in advertising. The creation of high-quality designs demands design-oriented planning, reasoning, and layer-wise generation. Unlike the recent CanvaGPT, which integrates GPT-4 with existing design templates to build a custom GPT, this paper introduces the COLE system - a hierarchical generation framework designed to comprehensively address these challenges. This COLE system can transform a vague intention prompt into a high-quality multi-layered graphic design, while also supporting flexible editing based on user input. Examples of such input might include directives like ``design a poster for Hisaishi's concert.'' The key insight is to dissect the complex task of text-to-design generation into a hierarchy of simpler sub-tasks, each addressed by specialized models working collaboratively. The results from these models are then consolidated to produce a cohesive final output. Our hierarchical task decomposition can streamline the complex process and significantly enhance generation reliability. Our COLE system comprises multiple fine-tuned Large Language Models (LLMs), Large Multimodal Models (LMMs), and Diffusion Models (DMs), each specifically tailored for design-aware layer-wise captioning, layout planning, reasoning, and the task of generating images and text. Furthermore, we construct the DESIGNINTENTION benchmark to demonstrate the superiority of our COLE system over existing methods in generating high-quality graphic designs from user intent. Last, we present a Canva-like multi-layered image editing tool to support flexible editing of the generated multi-layered graphic design images. We perceive our COLE system as an important step towards addressing more complex and multi-layered graphic design generation tasks in the future.
Related papers
- Group Diffusion Transformers are Unsupervised Multitask Learners [49.288489286276146]
Group Diffusion Transformers (GDTs) are a novel framework that unifies diverse visual generation tasks.
GDTs build upon diffusion transformers with minimal architectural modifications by concatenating self-attention tokens across images.
We evaluate GDTs on a benchmark featuring over 200 instructions across 30 distinct visual generation tasks.
arXiv Detail & Related papers (2024-10-19T07:53:15Z) - Multimodal Markup Document Models for Graphic Design Completion [23.009208137043178]
This paper presents multimodal markup document models (MarkupDM) that can generate both markup language and images within interleaved multimodal documents.
Unlike existing vision-and-language multimodal models, our MarkupDM tackles unique challenges critical to graphic design tasks.
We design an image quantizer to tokenize images of diverse sizes with transparency and modify a code language model to process markup languages and incorporate image modalities.
arXiv Detail & Related papers (2024-09-27T18:00:01Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.
Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.
We conduct extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models [81.6240188672294]
In graphic design, non-professional users often struggle to create visually appealing layouts due to limited skills and resources.
We introduce a novel multimodal instruction-following framework for layout planning, allowing users to easily arrange visual elements into tailored layouts.
Our method not only simplifies the design process for non-professionals but also surpasses the performance of few-shot GPT-4V models, with mIoU higher by 12% on Crello.
arXiv Detail & Related papers (2024-04-23T17:58:33Z) - Graphic Design with Large Multimodal Model [38.96206668552293]
Hierarchical Layout Generation (HLG) is a more flexible and pragmatic setup, which creates graphic composition from unordered sets of design elements.
To tackle the HLG task, we introduce Graphist, the first layout generation model based on large multimodal models.
Graphist efficiently reframes the HLG as a sequence generation problem, utilizing RGB-A images as input.
arXiv Detail & Related papers (2024-04-22T17:20:38Z) - PosterLayout: A New Benchmark and Approach for Content-aware
Visual-Textual Presentation Layout [62.12447593298437]
Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements.
We propose design sequence formation (DSF) that reorganizes elements in layouts to imitate the design processes of human designers.
A novel CNN-LSTM-based conditional generative adversarial network (GAN) is presented to generate proper layouts.
arXiv Detail & Related papers (2023-03-28T12:48:36Z) - LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication.
Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production.
Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z) - The Layout Generation Algorithm of Graphic Design Based on
Transformer-CVAE [8.052709336750823]
This paper implemented the Transformer model and conditional variational autoencoder (CVAE) to the graphic design layout generation task.
It proposed an end-to-end graphic design layout generation model named LayoutT-CVAE.
Compared with the existing state-of-art models, the layout generated by ours performs better on many metrics.
arXiv Detail & Related papers (2021-10-08T13:36:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.