PAID: A Framework of Product-Centric Advertising Image Design
- URL: http://arxiv.org/abs/2501.14316v2
- Date: Wed, 12 Feb 2025 06:48:03 GMT
- Title: PAID: A Framework of Product-Centric Advertising Image Design
- Authors: Hongyu Chen, Min Zhou, Jing Jiang, Jiale Chen, Yang Lu, Bo Xiao, Tiezheng Ge, Bo Zheng,
- Abstract summary: We propose a novel framework called Product-Centric Advertising Image Design (PAID)
It consists of four sequential stages to highlight product foregrounds and taglines while achieving overall image aesthetics.
To support the PAID framework, we create corresponding datasets with over 50,000 labeled images.
- Score: 31.08944590096747
- License:
- Abstract: Creating visually appealing advertising images is often a labor-intensive and time-consuming process. Is it possible to automatically generate such images using only basic product information--specifically, a product foreground image, taglines, and a target size? Existing methods mainly focus on parts of the problem and fail to provide a comprehensive solution. To address this gap, we propose a novel multistage framework called Product-Centric Advertising Image Design (PAID). It consists of four sequential stages to highlight product foregrounds and taglines while achieving overall image aesthetics: prompt generation, layout generation, background image generation, and graphics rendering. Different expert models are designed and trained for the first three stages: First, we use a visual language model (VLM) to generate background prompts that match the products. Next, a VLM-based layout generation model arranges the placement of product foregrounds, graphic elements (taglines and decorative underlays), and various nongraphic elements (objects from the background prompt). Following this, we train an SDXL-based image generation model that can simultaneously accept prompts, layouts, and foreground controls. To support the PAID framework, we create corresponding datasets with over 50,000 labeled images. Extensive experimental results and online A/B tests demonstrate that PAID can produce more visually appealing advertising images.
Related papers
- CTR-Driven Advertising Image Generation with Multimodal Large Language Models [53.40005544344148]
We explore the use of Multimodal Large Language Models (MLLMs) for generating advertising images by optimizing for Click-Through Rate (CTR) as the primary objective.
To further improve the CTR of generated images, we propose a novel reward model to fine-tune pre-trained MLLMs through Reinforcement Learning (RL)
Our method achieves state-of-the-art performance in both online and offline metrics.
arXiv Detail & Related papers (2025-02-05T09:06:02Z) - Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting [71.29100512700064]
We present T-Prompter, a training-free method for theme-specific image generation.
T-Prompter integrates reference images into generative models, allowing users to seamlessly specify the target theme.
Our approach enables consistent story generation, character design, realistic character generation, and style-guided image generation.
arXiv Detail & Related papers (2025-01-26T19:01:19Z) - Anywhere: A Multi-Agent Framework for Reliable and Diverse Foreground-Conditioned Image Inpainting [28.65445105418749]
We introduce Anywhere, a pioneering multi-agent framework designed to address challenges in inpainting foreground images.
Anywhere employs various agents such as Visual Language Model, Large Language Model, and image generation models.
It excels in foreground-conditioned image inpainting, mitigating "over-imagination", resolving foreground-background discrepancies, and enhancing diversity.
arXiv Detail & Related papers (2024-04-29T11:13:37Z) - Planning and Rendering: Towards Product Poster Generation with Diffusion Models [21.45855580640437]
We propose a novel product poster generation framework based on diffusion models named P&R.
At the planning stage, we propose a PlanNet to generate the layout of the product and other visual components.
At the rendering stage, we propose a RenderNet to generate the background for the product while considering the generated layout.
Our method outperforms the state-of-the-art product poster generation methods on PPG30k.
arXiv Detail & Related papers (2023-12-14T11:11:50Z) - Staging E-Commerce Products for Online Advertising using Retrieval
Assisted Image Generation [11.03803158931361]
We propose a generative adversarial network (GAN) based approach to generate staged backgrounds for un-staged product images.
We show how our staging approach can enable animations of moving products leading to a video ad from a product image.
arXiv Detail & Related papers (2023-07-28T06:04:46Z) - Generating Images with Multimodal Language Models [78.6660334861137]
We propose a method to fuse frozen text-only large language models with pre-trained image encoder and decoder models.
Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue.
arXiv Detail & Related papers (2023-05-26T19:22:03Z) - ProSpect: Prompt Spectrum for Attribute-Aware Personalization of
Diffusion Models [77.03361270726944]
Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models.
We propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information.
We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout.
arXiv Detail & Related papers (2023-05-25T16:32:01Z) - LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer [80.61492265221817]
Graphic layout designs play an essential role in visual communication.
Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production.
Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' desires.
arXiv Detail & Related papers (2022-12-19T21:57:35Z) - Automatic Generation of Product-Image Sequence in E-commerce [46.06263129000091]
Multi-modality Unified Imagesequence (MUIsC) is able to simultaneously detect all categories through learning rule violations.
By Dec 2021, our AGPIS framework has generated high-standard images for about 1.5 million products and achieves 13.6% in reject rate.
arXiv Detail & Related papers (2022-06-26T23:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.