Related papers: PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback

PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback

URL: http://arxiv.org/abs/2602.12127v1
Date: Thu, 12 Feb 2026 16:16:38 GMT
Title: PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback
Authors: Sixiang Chen, Jianyu Lai, Jialin Gao, Hengyu Shi, Zhongying Liu, Tian Ye, Junfeng Luo, Xiaoming Wei, Lei Zhu,
Abstract summary: Poster Omni is a generalized artistic poster creation framework.<n>It integrates the two regimes, namely local editing and global creation, within a single system.<n>It significantly enhances reference adherence, global composition quality, and aesthetic harmony.
Score: 30.88155039139322
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image-to-poster generation is a high-demand task requiring not only local adjustments but also high-level design understanding. Models must generate text, layout, style, and visual elements while preserving semantic fidelity and aesthetic coherence. The process spans two regimes: local editing, where ID-driven generation, rescaling, filling, and extending must preserve concrete visual entities; and global creation, where layout- and style-driven tasks rely on understanding abstract design concepts. These intertwined demands make image-to-poster a multi-dimensional process coupling entity-preserving editing with concept-driven creation under image-prompt control. To address these challenges, we propose PosterOmni, a generalized artistic poster creation framework that unlocks the potential of a base edit model for multi-task image-to-poster generation. PosterOmni integrates the two regimes, namely local editing and global creation, within a single system through an efficient data-distillation-reward pipeline: (i) constructing multi-scenario image-to-poster datasets covering six task types across entity-based and concept-based creation; (ii) distilling knowledge between local and global experts for supervised fine-tuning; and (iii) applying unified PosterOmni Reward Feedback to jointly align visual entity-preserving and aesthetic preference across all tasks. Additionally, we establish PosterOmni-Bench, a unified benchmark for evaluating both local editing and global creation. Extensive experiments show that PosterOmni significantly enhances reference adherence, global composition quality, and aesthetic harmony, outperforming all open-source baselines and even surpassing several proprietary systems.

Related papers

CoLoGen: Progressive Learning of Concept-Localization Duality for Unified Image Generation [55.409963941827044]
CoLoGen is a unified diffusion framework that progressively learns and reconciles concept-localization duality.<n>CoLoGen uses a staged curriculum that first builds core conceptual and localization abilities, then adapts them to diverse visual conditions, and finally refines their synergy for complex instruction-driven tasks.<n>Experiments on editing, controllable generation, and customized generation show that CoLoGen achieves competitive or superior performance.
arXiv Detail & Related papers (2026-02-25T17:59:29Z)
MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues [106.02577891104079]
We propose MagicQuill V2, a novel system that introduces a textbflayered composition paradigm to generative image editing.<n>Our method deconstructs creative intent into a stack of controllable visual cues.
arXiv Detail & Related papers (2025-12-02T18:59:58Z)
PosterForest: Hierarchical Multi-Agent Collaboration for Scientific Poster Generation [28.02969134846803]
We introduce the textitPoster Tree, a hierarchical intermediate representation that jointly encodes document structure and visual-textual relationships.<n>Our framework employs a multi-agent collaboration strategy, where agents specializing in content summarization and layout planning iteratively coordinate and provide mutual feedback.
arXiv Detail & Related papers (2025-08-29T15:36:06Z)
PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs [16.62052847270255]
PosterGen is a multi-agent framework that mirrors the workflow of professional poster designers.<n>It produces posters that are both semantically grounded and visually appealing.<n> Experimental results show that PosterGen consistently matches in content fidelity, and significantly outperforms existing methods in visual designs.
arXiv Detail & Related papers (2025-08-24T02:25:45Z)
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework [26.60241017305203]
PosterCraft is a unified framework that abandons prior modular pipelines and rigid, predefined layouts.<n>It employs a carefully designed, cascaded workflow to optimize the generation of high-aesthetic posters.<n>PosterCraft significantly outperforms open-source baselines in rendering accuracy, layout coherence, and overall visual appeal.
arXiv Detail & Related papers (2025-06-12T14:28:12Z)
CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design [69.83433430133302]
CreatiDesign is a systematic solution for automated graphic design covering both model architecture and dataset construction.<n>First, we design a unified multi-condition driven architecture that enables flexible and precise integration of heterogeneous design elements.<n> Furthermore, to ensure that each condition precisely controls its designated image region, we propose a multimodal attention mask mechanism.
arXiv Detail & Related papers (2025-05-25T12:14:23Z)
GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a Vision-Language Model (VLM)-based framework that generates content-aware text logo layouts.<n>We introduce two model techniques that reduce the computational cost for processing multiple glyph images simultaneously.<n>To support instruction tuning of our model, we construct two extensive text logo datasets that are five times larger than existing public datasets.
arXiv Detail & Related papers (2024-11-18T10:04:10Z)
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.<n>Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.<n>We develop an automated text-to-poster system that generates editable posters based on users' design intentions.
arXiv Detail & Related papers (2024-06-05T03:05:52Z)
COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design [39.809852329070466]
This paper introduces the COLE system - a hierarchical generation framework designed to address these challenges. This COLE system can transform a vague intention prompt into a high-quality multi-layered graphic design, while also supporting flexible editing based on user input.
arXiv Detail & Related papers (2023-11-28T17:22:17Z)
Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects. Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts. Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.