BannerAgency: Advertising Banner Design with Multimodal LLM Agents
- URL: http://arxiv.org/abs/2503.11060v1
- Date: Fri, 14 Mar 2025 03:54:05 GMT
- Title: BannerAgency: Advertising Banner Design with Multimodal LLM Agents
- Authors: Heng Wang, Yotaro Shimose, Shingo Takamatsu,
- Abstract summary: This paper introduces a training-free framework for fully automated banner ad design creation.<n>We present BannerAgency, an MLLM agent system that collaborates with advertisers to understand their brand identity and banner objectives.<n>It generates matching background images, creates blueprints for foreground design elements, and renders the final creatives as editable components in Figma or SVG formats.
- Score: 4.337357639279586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advertising banners are critical for capturing user attention and enhancing advertising campaign effectiveness. Creating aesthetically pleasing banner designs while conveying the campaign messages is challenging due to the large search space involving multiple design elements. Additionally, advertisers need multiple sizes for different displays and various versions to target different sectors of audiences. Since design is intrinsically an iterative and subjective process, flexible editability is also in high demand for practical usage. While current models have served as assistants to human designers in various design tasks, they typically handle only segments of the creative design process or produce pixel-based outputs that limit editability. This paper introduces a training-free framework for fully automated banner ad design creation, enabling frontier multimodal large language models (MLLMs) to streamline the production of effective banners with minimal manual effort across diverse marketing contexts. We present BannerAgency, an MLLM agent system that collaborates with advertisers to understand their brand identity and banner objectives, generates matching background images, creates blueprints for foreground design elements, and renders the final creatives as editable components in Figma or SVG formats rather than static pixels. To facilitate evaluation and future research, we introduce BannerRequest400, a benchmark featuring 100 unique logos paired with 400 diverse banner requests. Through quantitative and qualitative evaluations, we demonstrate the framework's effectiveness, emphasizing the quality of the generated banner designs, their adaptability to various banner requests, and their strong editability enabled by this component-based approach.
Related papers
- POSTA: A Go-to Framework for Customized Artistic Poster Generation [87.16343612086959]
POSTA is a modular framework for customized artistic poster generation.
Background Diffusion creates a themed background based on user input.
Design MLLM then generates layout and typography elements that align with and complement the background style.
ArtText Diffusion applies additional stylization to key text elements.
arXiv Detail & Related papers (2025-03-19T05:22:38Z) - CTR-Driven Advertising Image Generation with Multimodal Large Language Models [53.40005544344148]
We explore the use of Multimodal Large Language Models (MLLMs) for generating advertising images by optimizing for Click-Through Rate (CTR) as the primary objective.<n>To further improve the CTR of generated images, we propose a novel reward model to fine-tune pre-trained MLLMs through Reinforcement Learning (RL)<n>Our method achieves state-of-the-art performance in both online and offline metrics.
arXiv Detail & Related papers (2025-02-05T09:06:02Z) - PAID: A Framework of Product-Centric Advertising Image Design [31.08944590096747]
We propose a novel framework called Product-Centric Advertising Image Design (PAID)<n>It consists of four sequential stages to highlight product foregrounds and taglines while achieving overall image aesthetics.<n>To support the PAID framework, we create corresponding datasets with over 50,000 labeled images.
arXiv Detail & Related papers (2025-01-24T08:21:35Z) - GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a VLM-based framework that generates content-aware text logo layouts.
We introduce two model techniques to reduce the computation for processing multiple glyph images simultaneously.
To support instruction-tuning of out model, we construct two extensive text logo datasets, which are 5x more larger than the existing public dataset.
arXiv Detail & Related papers (2024-11-18T10:04:10Z) - Influencer: Empowering Everyday Users in Creating Promotional Posts via AI-infused Exploration and Customization [11.9449656506593]
Influen is an interactive tool to assist novice creators in crafting high-quality promotional post designs.
Within Influencer, we contribute a multi-dimensional recommendation framework that allows users to intuitively generate new ideas.
Influential implements a holistic promotional post design system that supports context-aware image and caption exploration.
arXiv Detail & Related papers (2024-07-20T16:27:49Z) - MetaDesigner: Advancing Artistic Typography Through AI-Driven, User-Centric, and Multilingual WordArt Synthesis [65.78359025027457]
MetaDesigner introduces a transformative framework for artistic typography, powered by Large Language Models (LLMs)
Its foundation is a multi-agent system comprising the Pipeline, Glyph, and Texture agents, which collectively orchestrate the creation of customizable WordArt.
arXiv Detail & Related papers (2024-06-28T11:58:26Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.<n>Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.<n>We develop an automated text-to-poster system that generates editable posters based on users' design intentions.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - Chaining text-to-image and large language model: A novel approach for generating personalized e-commerce banners [8.508453886143677]
We demonstrate the use of text-to-image models for generating personalized web banners for online shoppers.
The novelty in this approach lies in converting users' interaction data to meaningful prompts without human intervention.
Our results show that the proposed approach can create high-quality personalized banners for users.
arXiv Detail & Related papers (2024-02-28T07:56:04Z) - Cross-Element Combinatorial Selection for Multi-Element Creative in
Display Advertising [16.527943807941856]
This paper proposes a Cross-Element Combinatorial Selection framework for multiple creative elements.
In the encoder process, a cross-element interaction is adopted to dynamically adjust the expression of a single creative element.
Experiments on real-world datasets show that CECS achieved the SOTA score on offline metrics.
arXiv Detail & Related papers (2023-07-04T09:32:39Z) - LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication.
Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production.
Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z) - Recommending Themes for Ad Creative Design via Visual-Linguistic
Representations [27.13752835161338]
We propose a theme (keyphrase) recommender system for ad creative strategists.
The theme recommender is based on aggregating results from a visual question answering (VQA) task.
We show that cross-modal representations lead to significantly better classification accuracy and ranking precision-recall metrics.
arXiv Detail & Related papers (2020-01-20T18:04:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.