GPR: Towards a Generative Pre-trained One-Model Paradigm for Large-Scale Advertising Recommendation
- URL: http://arxiv.org/abs/2511.10138v1
- Date: Fri, 14 Nov 2025 01:34:51 GMT
- Title: GPR: Towards a Generative Pre-trained One-Model Paradigm for Large-Scale Advertising Recommendation
- Authors: Jun Zhang, Yi Li, Yue Liu, Changping Wang, Yuan Wang, Yuling Xiong, Xun Liu, Haiyang Wu, Qian Li, Enming Zhang, Jiawei Sun, Xin Xu, Zishuai Zhang, Ruoran Liu, Suyuan Huang, Zhaoxin Zhang, Zhengkai Guo, Shuojin Yang, Meng-Hao Guo, Huan Yu, Jie Jiang, Shi-Min Hu,
- Abstract summary: We propose GPR (Generative Pre-trained Recommender), a one-model framework that redefines advertising recommendation as an end-to-end generative task.<n>We introduce three key innovations spanning unified representation, network architecture, and training strategy.<n>GPR has been fully deployed in the Tencent Weixin Channels advertising system, delivering significant improvements in key business metrics.
- Score: 38.48999566011862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As an intelligent infrastructure connecting users with commercial content, advertising recommendation systems play a central role in information flow and value creation within the digital economy. However, existing multi-stage advertising recommendation systems suffer from objective misalignment and error propagation, making it difficult to achieve global optimality, while unified generative recommendation models still struggle to meet the demands of practical industrial applications. To address these issues, we propose GPR (Generative Pre-trained Recommender), the first one-model framework that redefines advertising recommendation as an end-to-end generative task, replacing the traditional cascading paradigm with a unified generative approach. To realize GPR, we introduce three key innovations spanning unified representation, network architecture, and training strategy. First, we design a unified input schema and tokenization method tailored to advertising scenarios, mapping both ads and organic content into a shared multi-level semantic ID space, thereby enhancing semantic alignment and modeling consistency across heterogeneous data. Second, we develop the Heterogeneous Hierarchical Decoder (HHD), a dual-decoder architecture that decouples user intent modeling from ad generation, achieving a balance between training efficiency and inference flexibility while maintaining strong modeling capacity. Finally, we propose a multi-stage joint training strategy that integrates Multi-Token Prediction (MTP), Value-Aware Fine-Tuning and the Hierarchy Enhanced Policy Optimization (HEPO) algorithm, forming a complete generative recommendation pipeline that unifies interest modeling, value alignment, and policy optimization. GPR has been fully deployed in the Tencent Weixin Channels advertising system, delivering significant improvements in key business metrics including GMV and CTCVR.
Related papers
- OneRanker: Unified Generation and Ranking with One Model in Industrial Advertising Recommendation [16.27240743307534]
We propose OneRanker, achieving architectural-level deep integration of generation and ranking.<n>We construct a coarse-to-fine collaborative target awareness mechanism.<n>The full deployment on Tencent's WeiXin channels advertising system has shown a significant improvement in key business metrics.
arXiv Detail & Related papers (2026-03-03T13:50:22Z) - Generative Recommendation for Large-Scale Advertising [43.694084612630554]
We present a production-oriented generative recommender co-designed across architecture, learning, and serving.<n> GR4AD has been fully deployed in Kuaishou advertising system with over 400 million users.
arXiv Detail & Related papers (2026-02-26T08:15:26Z) - MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z) - Co-EPG: A Framework for Co-Evolution of Planning and Grounding in Autonomous GUI Agents [10.528687017443852]
Co-EPG is a self-iterative training framework for Co-Evolution of Planning and Grounding.<n>This work establishes a novel training paradigm for GUI agents, shifting from isolated optimization to an integrated, self-driven co-evolution approach.
arXiv Detail & Related papers (2025-11-13T03:41:02Z) - Decoupled Multimodal Fusion for User Interest Modeling in Click-Through Rate Prediction [6.663141182602147]
We propose Decoupled Multimodal Fusion (DMF) to enable fine-grained interactions between ID-based collaborative representations and multimodal representations for user interest modeling.<n>We construct target-aware features to bridge the semantic gap across different embedding spaces and leverage them as side information to enhance the effectiveness of user interest modeling.<n>DMF has been deployed on the product recommendation system of the international e-commerce platform, achieving relative improvements of 5.30% in CTCVR and 7.43% in GMV with negligible computational overhead.
arXiv Detail & Related papers (2025-10-13T07:06:26Z) - A Unified Multi-Task Learning Framework for Generative Auto-Bidding with Validation-Aligned Optimization [51.27959658504722]
Multi-task learning offers a principled framework to train these tasks jointly through shared representations.<n>Existing multi-task optimization strategies are primarily guided by training dynamics and often generalize poorly in volatile bidding environments.<n>We present Validation-Aligned Multi-task Optimization (VAMO), which adaptively assigns task weights based on the alignment between per-task training gradients and a held-out validation gradient.
arXiv Detail & Related papers (2025-10-09T03:59:51Z) - RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1 [20.92548890511589]
This paper introduces RecLLM-R1, a novel recommendation framework leveraging Large Language Models (LLMs)<n> RecLLM-R1 significantly surpasses existing baseline methods across a spectrum of evaluation metrics, including accuracy, diversity, and novelty.
arXiv Detail & Related papers (2025-06-24T01:39:34Z) - EGA-V2: An End-to-end Generative Framework for Industrial Advertising [19.927005856735445]
We introduce End-to-End Generative Advertising (EGA-V2), the first unified framework that holistically models user interests, point-of-interest (POI) and creative generation, ad allocation, and payment optimization.<n>Our results highlight its potential as a pioneering fully generative advertising solution, paving the way for next-generation industrial ad systems.
arXiv Detail & Related papers (2025-05-23T06:55:02Z) - Action is All You Need: Dual-Flow Generative Ranking Network for Recommendation [25.30922374657862]
We propose a Dual-Flow Generative Ranking Network (DFGR) that employs a dual-flow mechanism to optimize interaction modeling.<n> DFGR duplicates the original user behavior sequence into a real flow and a fake flow based on the authenticity of the action information.<n>This design reduces computational overhead and improves both training efficiency and inference performance compared to Meta's HSTU-based model.
arXiv Detail & Related papers (2025-05-22T14:58:53Z) - MMaDA: Multimodal Large Diffusion Language Models [61.13527224215318]
We introduce MMaDA, a novel class of multimodal diffusion foundation models.<n>It is designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation.
arXiv Detail & Related papers (2025-05-21T17:59:05Z) - Unleash LLMs Potential for Recommendation by Coordinating Twin-Tower Dynamic Semantic Token Generator [60.07198935747619]
We propose Twin-Tower Dynamic Semantic Recommender (T TDS), the first generative RS which adopts dynamic semantic index paradigm.
To be more specific, we for the first time contrive a dynamic knowledge fusion framework which integrates a twin-tower semantic token generator into the LLM-based recommender.
The proposed T TDS recommender achieves an average improvement of 19.41% in Hit-Rate and 20.84% in NDCG metric, compared with the leading baseline methods.
arXiv Detail & Related papers (2024-09-14T01:45:04Z) - Optimization-Inspired Learning with Architecture Augmentations and
Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles.
We construct three propagative modules to effectively solve the optimization models with flexible combinations.
Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.