Generative Modeling with Multi-Instance Reward Learning for E-commerce Creative Optimization
- URL: http://arxiv.org/abs/2508.09730v1
- Date: Wed, 13 Aug 2025 11:53:41 GMT
- Title: Generative Modeling with Multi-Instance Reward Learning for E-commerce Creative Optimization
- Authors: Qiaolei Gu, Yu Li, DingYi Zeng, Lu Wang, Ming Pang, Changping Peng, Zhangang Lin, Ching Law, Jingping Shao,
- Abstract summary: In e-commerce advertising, selecting the most compelling combination of creative elements is critical for capturing user attention and driving conversions.<n>We propose a novel framework named GenCO that integrates generative modeling with multi-instance reward learning.<n>Our approach has significantly increased advertising revenue, demonstrating its practical value.
- Score: 15.51942931334223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In e-commerce advertising, selecting the most compelling combination of creative elements -- such as titles, images, and highlights -- is critical for capturing user attention and driving conversions. However, existing methods often evaluate creative components individually, failing to navigate the exponentially large search space of possible combinations. To address this challenge, we propose a novel framework named GenCO that integrates generative modeling with multi-instance reward learning. Our unified two-stage architecture first employs a generative model to efficiently produce a diverse set of creative combinations. This generative process is optimized with reinforcement learning, enabling the model to effectively explore and refine its selections. Next, to overcome the challenge of sparse user feedback, a multi-instance learning model attributes combination-level rewards, such as clicks, to the individual creative elements. This allows the reward model to provide a more accurate feedback signal, which in turn guides the generative model toward creating more effective combinations. Deployed on a leading e-commerce platform, our approach has significantly increased advertising revenue, demonstrating its practical value. Additionally, we are releasing a large-scale industrial dataset to facilitate further research in this important domain.
Related papers
- PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards [86.1965460124838]
We propose a scalable multi-subject data generation pipeline.<n>We first enable single-subject personalization models to acquire knowledge of multi-image and multi-subject scenarios.<n>To enhance both subject consistency and text controllability, we design a set of Pairwise Subject-Consistency Rewards.
arXiv Detail & Related papers (2025-12-01T03:25:49Z) - SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model [49.65930977591188]
Multimodal embedding models aim to yield informative unified representations that empower diverse cross-modal tasks.<n>We introduce SAIL-Embedding, an omni-modal embedding foundation model that addresses these issues through tailored training strategies and architectural design.<n>Specifically, the content-aware progressive training aims to enhance the model's adaptability to diverse downstream tasks and master enriched cross-modal proficiency.<n>The collaboration-aware recommendation enhancement training further adapts multimodal representations for recommendation scenarios by distilling knowledge from sequence-to-item and ID-to-item embeddings.
arXiv Detail & Related papers (2025-10-14T16:43:22Z) - HLLM-Creator: Hierarchical LLM-based Personalized Creative Generation [24.94016591437963]
Current AIGC systems rely heavily on creators' inspiration, rarely generating truly user-personalized content.<n>We propose HLLM-Creator, a hierarchical LLM framework for efficient user interest modeling and personalized content generation.<n>Experiments on personalized title generation for Douyin Search Ads show the effectiveness of our model.
arXiv Detail & Related papers (2025-08-25T15:23:21Z) - CTR-Driven Advertising Image Generation with Multimodal Large Language Models [53.40005544344148]
We explore the use of Multimodal Large Language Models (MLLMs) for generating advertising images by optimizing for Click-Through Rate (CTR) as the primary objective.<n>To further improve the CTR of generated images, we propose a novel reward model to fine-tune pre-trained MLLMs through Reinforcement Learning (RL)<n>Our method achieves state-of-the-art performance in both online and offline metrics.
arXiv Detail & Related papers (2025-02-05T09:06:02Z) - A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models.
Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning.
We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z) - DiffMM: Multi-Modal Diffusion Model for Recommendation [19.43775593283657]
We propose a novel multi-modal graph diffusion model for recommendation called DiffMM.
Our framework integrates a modality-aware graph diffusion model with a cross-modal contrastive learning paradigm to improve modality-aware user representation learning.
arXiv Detail & Related papers (2024-06-17T17:35:54Z) - Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond [87.1712108247199]
Our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP)
We develop a generic and personalization generative framework, that can handle a wide range of personalized needs.
Our methodology enhances the capabilities of foundational language models for personalized tasks.
arXiv Detail & Related papers (2024-03-15T20:21:31Z) - A New Creative Generation Pipeline for Click-Through Rate with Stable
Diffusion Model [8.945197427679924]
Traditional AI-based approaches face the same problem of not considering user information while having limited aesthetic knowledge from designers.
To optimize the results, the generated creatives in traditional methods are then ranked by another module named creative ranking model.
This paper proposes a new automated Creative Generation pipeline for Click-Through Rate (CG4CTR) with the goal of improving CTR during the creative generation stage.
arXiv Detail & Related papers (2024-01-17T03:27:39Z) - AdBooster: Personalized Ad Creative Generation using Stable Diffusion
Outpainting [7.515971669919419]
In digital advertising, the selection of the optimal item (recommendation) and its best creative presentation (creative optimization) have traditionally been considered separate disciplines.
We introduce the task of generative models for creative generation that incorporate user interests, and itshape AdBooster, a model for personalized ad creatives.
arXiv Detail & Related papers (2023-09-08T12:57:05Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Cross-Element Combinatorial Selection for Multi-Element Creative in
Display Advertising [16.527943807941856]
This paper proposes a Cross-Element Combinatorial Selection framework for multiple creative elements.
In the encoder process, a cross-element interaction is adopted to dynamically adjust the expression of a single creative element.
Experiments on real-world datasets show that CECS achieved the SOTA score on offline metrics.
arXiv Detail & Related papers (2023-07-04T09:32:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.