RefAdGen: High-Fidelity Advertising Image Generation
- URL: http://arxiv.org/abs/2508.11695v1
- Date: Tue, 12 Aug 2025 18:25:31 GMT
- Title: RefAdGen: High-Fidelity Advertising Image Generation
- Authors: Yiyun Chen, Weikai Yang,
- Abstract summary: RefAdGen is a generation framework that achieves high fidelity through a decoupled design.<n>We show that RefAdGen achieves state-of-the-art performance, showcasing robust generalization by maintaining high fidelity and remarkable visual results for both unseen products and challenging real-world, in-the-wild images.
- Score: 2.38180456064897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of Artificial Intelligence Generated Content (AIGC) techniques has unlocked opportunities in generating diverse and compelling advertising images based on referenced product images and textual scene descriptions. This capability substantially reduces human labor and production costs in traditional marketing workflows. However, existing AIGC techniques either demand extensive fine-tuning for each referenced image to achieve high fidelity, or they struggle to maintain fidelity across diverse products, making them impractical for e-commerce and marketing industries. To tackle this limitation, we first construct AdProd-100K, a large-scale advertising image generation dataset. A key innovation in its construction is our dual data augmentation strategy, which fosters robust, 3D-aware representations crucial for realistic and high-fidelity image synthesis. Leveraging this dataset, we propose RefAdGen, a generation framework that achieves high fidelity through a decoupled design. The framework enforces precise spatial control by injecting a product mask at the U-Net input, and employs an efficient Attention Fusion Module (AFM) to integrate product features. This design effectively resolves the fidelity-efficiency dilemma present in existing methods. Extensive experiments demonstrate that RefAdGen achieves state-of-the-art performance, showcasing robust generalization by maintaining high fidelity and remarkable visual results for both unseen products and challenging real-world, in-the-wild images. This offers a scalable and cost-effective alternative to traditional workflows. Code and datasets are publicly available at https://github.com/Anonymous-Name-139/RefAdgen.
Related papers
- HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images [61.3479037455798]
HiFi-Inpaint is a novel high-fidelity reference-based inpainting framework tailored for generating human-product images.<n>We introduce Shared Enhancement Attention (SEA) to refine fine-grained product features and Detail-Aware Loss (DAL) to enforce precise pixel-level supervision.<n>We construct a new dataset, HP-Image-40K, with samples curated from self-synthesis data and processed with automatic filtering.
arXiv Detail & Related papers (2026-03-02T18:59:36Z) - EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model [56.53617289548353]
EchoGen is a pioneering framework that empowers Visual Auto-Regressive ( VAR) models with subject-driven generation capabilities.<n>We employ a semantic encoder to extract the subject's abstract identity, which is injected through decoupled cross-attention to guide the overall composition.<n>To the best of our knowledge, EchoGen is the first feed-forward subject-driven framework built upon VAR models.
arXiv Detail & Related papers (2025-09-30T11:45:48Z) - Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation [66.73899356886652]
We build an image tokenizer directly atop pre-trained vision foundation models.<n>Our proposed image tokenizer, VFMTok, achieves substantial improvements in image reconstruction and generation quality.<n>It further boosts autoregressive (AR) generation -- achieving a gFID of 2.07 on ImageNet benchmarks.
arXiv Detail & Related papers (2025-07-11T09:32:45Z) - DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers [30.583932208752877]
In e-commerce and digital marketing, generating high-fidelity human-product demonstration videos is important.<n>We propose a Diffusion Transformer (DiT)-based framework to preserve human identities and product-specific details.<n>We employ a 3D body mesh template and product bounding boxes to provide precise motion guidance, enabling intuitive alignment of hand gestures with product placements.
arXiv Detail & Related papers (2025-06-12T10:58:23Z) - Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation [54.588082888166504]
We present Mogao, a unified framework that enables interleaved multi-modal generation through a causal approach.<n>Mogoo integrates a set of key technical improvements in architecture design, including a deep-fusion design, dual vision encoders, interleaved rotary position embeddings, and multi-modal classifier-free guidance.<n>Experiments show that Mogao achieves state-of-the-art performance in multi-modal understanding and text-to-image generation, but also excels in producing high-quality, coherent interleaved outputs.
arXiv Detail & Related papers (2025-05-08T17:58:57Z) - Preserving Product Fidelity in Large Scale Image Recontextualization with Diffusion Models [1.8606057023042066]
We present a framework for high-fidelity product image recontextualization using text-to-image diffusion models and a novel data augmentation pipeline.<n>Our method improves the quality and diversity of generated images by disentangling product representations and enhancing the model's understanding of product characteristics.
arXiv Detail & Related papers (2025-03-11T01:24:39Z) - CTR-Driven Advertising Image Generation with Multimodal Large Language Models [53.40005544344148]
We explore the use of Multimodal Large Language Models (MLLMs) for generating advertising images by optimizing for Click-Through Rate (CTR) as the primary objective.<n>To further improve the CTR of generated images, we propose a novel reward model to fine-tune pre-trained MLLMs through Reinforcement Learning (RL)<n>Our method achieves state-of-the-art performance in both online and offline metrics.
arXiv Detail & Related papers (2025-02-05T09:06:02Z) - Generative AI for Vision: A Comprehensive Study of Frameworks and Applications [0.0]
Generative AI is transforming image synthesis, enabling the creation of high-quality, diverse, and photorealistic visuals.<n>This work presents a structured classification of image generation techniques based on the nature of the input.<n>We highlight key frameworks including DALL-E, ControlNet, and DeepSeek Janus-Pro, and address challenges such as computational costs, data biases, and output alignment with user intent.
arXiv Detail & Related papers (2025-01-29T22:42:05Z) - E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation [69.72194342962615]
We introduce and address a novel research direction: can the process of distilling GANs from diffusion models be made significantly more efficient?
First, we construct a base GAN model with generalized features, adaptable to different concepts through fine-tuning, eliminating the need for training from scratch.
Second, we identify crucial layers within the base GAN model and employ Low-Rank Adaptation (LoRA) with a simple yet effective rank search process, rather than fine-tuning the entire base model.
Third, we investigate the minimal amount of data necessary for fine-tuning, further reducing the overall training time.
arXiv Detail & Related papers (2024-01-11T18:59:14Z) - Interactive Data Synthesis for Systematic Vision Adaptation via
LLMs-AIGCs Collaboration [48.54002313329872]
We propose a new paradigm of annotated data expansion named as ChatGenImage.
The core idea behind it is to leverage the complementary strengths of diverse models to establish a highly effective and user-friendly pipeline for interactive data augmentation.
We present fascinating results obtained from our ChatGenImage framework and demonstrate the powerful potential of our synthetic data for systematic vision adaptation.
arXiv Detail & Related papers (2023-05-22T07:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.