PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
- URL: http://arxiv.org/abs/2506.10741v1
- Date: Thu, 12 Jun 2025 14:28:12 GMT
- Title: PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
- Authors: SiXiang Chen, Jianyu Lai, Jialin Gao, Tian Ye, Haoyu Chen, Hengyu Shi, Shitong Shao, Yunlong Lin, Song Fei, Zhaohu Xing, Yeying Jin, Junfeng Luo, Xiaoming Wei, Lei Zhu,
- Abstract summary: PosterCraft is a unified framework that abandons prior modular pipelines and rigid, predefined layouts.<n>It employs a carefully designed, cascaded workflow to optimize the generation of high-aesthetic posters.<n>PosterCraft significantly outperforms open-source baselines in rendering accuracy, layout coherence, and overall visual appeal.
- Score: 26.60241017305203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating aesthetic posters is more challenging than simple design images: it requires not only precise text rendering but also the seamless integration of abstract artistic content, striking layouts, and overall stylistic harmony. To address this, we propose PosterCraft, a unified framework that abandons prior modular pipelines and rigid, predefined layouts, allowing the model to freely explore coherent, visually compelling compositions. PosterCraft employs a carefully designed, cascaded workflow to optimize the generation of high-aesthetic posters: (i) large-scale text-rendering optimization on our newly introduced Text-Render-2M dataset; (ii) region-aware supervised fine-tuning on HQ-Poster100K; (iii) aesthetic-text-reinforcement learning via best-of-n preference optimization; and (iv) joint vision-language feedback refinement. Each stage is supported by a fully automated data-construction pipeline tailored to its specific needs, enabling robust training without complex architectural modifications. Evaluated on multiple experiments, PosterCraft significantly outperforms open-source baselines in rendering accuracy, layout coherence, and overall visual appeal-approaching the quality of SOTA commercial systems. Our code, models, and datasets can be found in the Project page: https://ephemeral182.github.io/PosterCraft
Related papers
- CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic Design Generation [13.354283356097563]
CreatiPoster is a framework that generates editable, multilayer compositions from optional natural-language instructions or assets.<n>To further research, we release a copyright-free corpus of 100,000 multi-layer designs.
arXiv Detail & Related papers (2025-06-12T16:54:39Z) - PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering [50.76106125697899]
Product posters, which integrate subject, scene, and text, are crucial promotional tools for attracting customers.<n>Main challenge lies in accurately rendering text, especially for complex writing systems like Chinese, which contains over 10,000 individual characters.<n>We develop TextRenderNet, which achieves a high text rendering accuracy of over 90%.<n>Based on TextRenderNet and SceneGenNet, we present PosterMaker, an end-to-end generation framework.
arXiv Detail & Related papers (2025-04-09T07:13:08Z) - POSTA: A Go-to Framework for Customized Artistic Poster Generation [87.16343612086959]
POSTA is a modular framework for customized artistic poster generation.<n>Background Diffusion creates a themed background based on user input.<n>Design MLLM then generates layout and typography elements that align with and complement the background style.<n>ArtText Diffusion applies additional stylization to key text elements.
arXiv Detail & Related papers (2025-03-19T05:22:38Z) - Compose Your Aesthetics: Empowering Text-to-Image Models with the Principles of Art [61.28133495240179]
We propose a novel task of aesthetics alignment which seeks to align user-specified aesthetics with the T2I generation output.<n>Inspired by how artworks provide an invaluable perspective to approach aesthetics, we codify visual aesthetics using the compositional framework artists employ.<n>We demonstrate that T2I DMs can effectively offer 10 compositional controls through user-specified PoA conditions.
arXiv Detail & Related papers (2025-03-15T06:58:09Z) - GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a VLM-based framework that generates content-aware text logo layouts.
We introduce two model techniques to reduce the computation for processing multiple glyph images simultaneously.
To support instruction-tuning of out model, we construct two extensive text logo datasets, which are 5x more larger than the existing public dataset.
arXiv Detail & Related papers (2024-11-18T10:04:10Z) - Towards Visual Text Design Transfer Across Languages [49.78504488452978]
We introduce a novel task of Multimodal Style Translation (MuST-Bench)
MuST-Bench is a benchmark designed to evaluate the ability of visual text generation models to perform translation across different writing systems.
In response, we introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions.
arXiv Detail & Related papers (2024-10-24T15:15:01Z) - GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models [7.152732507491591]
We propose an automatic poster generation framework with text rendering capabilities leveraging LLMs.<n>This framework aims to create precise poster text within a detailed contextual background.<n>We introduce a high-resolution font dataset and a poster dataset with resolutions exceeding 1024 pixels.
arXiv Detail & Related papers (2024-07-02T13:17:49Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.<n>Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.<n>We develop an automated text-to-poster system that generates editable posters based on users' design intentions.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication.
Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production.
Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.