See Beyond a Single View: Multi-Attribution Learning Leads to Better Conversion Rate Prediction
- URL: http://arxiv.org/abs/2508.15217v1
- Date: Thu, 21 Aug 2025 04:05:31 GMT
- Title: See Beyond a Single View: Multi-Attribution Learning Leads to Better Conversion Rate Prediction
- Authors: Sishuo Chen, Zhangming Chan, Xiang-Rong Sheng, Lei Zhang, Sheng Chen, Chenghuan Hou, Han Zhu, Jian Xu, Bo Zheng,
- Abstract summary: Conversion rate (CVR) prediction is a core component of online advertising systems.<n>Traditional approaches restrict model training to labels from a single production-critical attribution mechanism.<n>We propose a novel Multi-Attribution Learning framework for CVR prediction that integrates signals from multiple attribution perspectives.
- Score: 30.31722186766052
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conversion rate (CVR) prediction is a core component of online advertising systems, where the attribution mechanisms-rules for allocating conversion credit across user touchpoints-fundamentally determine label generation and model optimization. While many industrial platforms support diverse attribution mechanisms (e.g., First-Click, Last-Click, Linear, and Data-Driven Multi-Touch Attribution), conventional approaches restrict model training to labels from a single production-critical attribution mechanism, discarding complementary signals in alternative attribution perspectives. To address this limitation, we propose a novel Multi-Attribution Learning (MAL) framework for CVR prediction that integrates signals from multiple attribution perspectives to better capture the underlying patterns driving user conversions. Specifically, MAL is a joint learning framework consisting of two core components: the Attribution Knowledge Aggregator (AKA) and the Primary Target Predictor (PTP). AKA is implemented as a multi-task learner that integrates knowledge extracted from diverse attribution labels. PTP, in contrast, focuses on the task of generating well-calibrated conversion probabilities that align with the system-optimized attribution metric (e.g., CVR under the Last-Click attribution), ensuring direct compatibility with industrial deployment requirements. Additionally, we propose CAT, a novel training strategy that leverages the Cartesian product of all attribution label combinations to generate enriched supervision signals. This design substantially enhances the performance of the attribution knowledge aggregator. Empirical evaluations demonstrate the superiority of MAL over single-attribution learning baselines, achieving +0.51% GAUC improvement on offline metrics. Online experiments demonstrate that MAL achieved a +2.6% increase in ROI (Return on Investment).
Related papers
- MAC: A Conversion Rate Prediction Benchmark Featuring Labels Under Multiple Attribution Mechanisms [26.416305996561565]
Multi-attribution learning (MAL) enhances model performance by learning from conversion labels yielded by multiple attribution mechanisms.<n>We establish the Multi-Attribution Benchmark (MAC), the first public CVR dataset featuring labels from multiple attribution mechanisms.<n>We also develop PyMAL, an open-source library covering a wide array of baseline methods.
arXiv Detail & Related papers (2026-03-02T18:51:01Z) - RELATE: A Reinforcement Learning-Enhanced LLM Framework for Advertising Text Generation [17.34586562700226]
In online advertising, advertising text plays a critical role in attracting user engagement and driving advertiser value.<n>We propose RELATE, a reinforcement learning-based end-to-end framework that unifies generation and objective alignment within a single model.<n>To better capture ultimate advertiser value beyond click-level signals, We incorporate conversion-oriented metrics into the objective and jointly model them with compliance constraints as multi-dimensional rewards.
arXiv Detail & Related papers (2026-02-12T10:00:55Z) - EST: Towards Efficient Scaling Laws in Click-Through Rate Prediction via Unified Modeling [13.693397814262681]
Efficiently scaling industrial Click-Through Rate (CTR) prediction has recently attracted significant research attention.<n>We propose the Efficiently Scalable Transformer (EST), which achieves fully unified modeling by processing all raw inputs in a single sequence without lossy aggregation.<n>EST significantly outperforms production baselines, delivering a 3.27% RPM (Revenue Per Mile) increase and a 1.22% CTR lift.
arXiv Detail & Related papers (2026-02-11T12:51:54Z) - No One Left Behind: How to Exploit the Incomplete and Skewed Multi-Label Data for Conversion Rate Prediction [48.578518946398354]
In most real-world online advertising systems, advertisers typically have diverse customer acquisition goals.<n>A common solution is to use multi-task learning to train a unified model on post-click data to estimate the conversion rate (CVR) for diverse targets.<n>In practice, CVR prediction often encounters missing conversion data as many advertisers submit only a subset of user conversion actions due to privacy or other constraints.
arXiv Detail & Related papers (2025-12-15T13:14:20Z) - MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources [113.33902847941941]
Variance-Aware Sampling (VAS) is a data selection strategy guided by Variance Promotion Score (VPS)<n>We release large-scale, carefully curated resources containing 1.6M long CoT cold-start data and 15k RL QA pairs.<n> Experiments across mathematical reasoning benchmarks demonstrate the effectiveness of both the curated data and the proposed VAS.
arXiv Detail & Related papers (2025-09-25T14:58:29Z) - From scratch to silver: Creating trustworthy training data for patent-SDG classification using Large Language Models [0.6727984016678534]
Classifying patents by their relevance to the UN Sustainable Development Goals (SDGs) is crucial for tracking how innovation addresses global challenges.<n>This paper frames patent-to-SDG classification as a weak supervision problem, using citations from patents to scientific publications (NPL citations) as a noisy initial signal.<n>We develop a composite labeling function (LF) that uses large language models (LLMs) to extract structured concepts from patents and papers based on a patent.
arXiv Detail & Related papers (2025-09-11T09:44:16Z) - Intrinsic Training Signals for Federated Learning Aggregation [10.532838477096055]
Federated Learning (FL) enables collaborative model training across distributed clients while preserving data privacy.<n>This work demonstrates that effective model merging can be achieved solely through existing training signals.
arXiv Detail & Related papers (2025-07-09T13:03:23Z) - SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling [58.05959902776133]
We introduce Single-Pass.<n>with Reference-Guided Evaluation (SPARE), a novel structured framework that enables efficient per-step annotation.<n>We demonstrate SPARE's effectiveness across four diverse datasets spanning mathematical reasoning (GSM8K, MATH), multi-hop question answering (MuSiQue-Ans), and spatial reasoning (SpaRP)<n>On ProcessBench, SPARE demonstrates data-efficient out-of-distribution generalization, using only $sim$16% of training samples compared to human-labeled and other synthetically trained baselines.
arXiv Detail & Related papers (2025-06-18T14:37:59Z) - Learning Item Representations Directly from Multimodal Features for Effective Recommendation [51.49251689107541]
multimodal recommender systems predominantly leverage Bayesian Personalized Ranking (BPR) optimization to learn item representations.<n>We propose a novel model (i.e., LIRDRec) that learns item representations directly from multimodal features to augment recommendation performance.
arXiv Detail & Related papers (2025-05-08T05:42:22Z) - Enhancing Customer Churn Prediction in Telecommunications: An Adaptive Ensemble Learning Approach [0.0]
This paper proposes a novel adaptive ensemble learning framework for highly accurate customer churn prediction.
The framework integrates multiple base models, including XGBoost, LightGBM, LSTM, a Multi-Layer Perceptron (MLP) neural network, and Support Vector Machine (SVM)
The research achieves a remarkable 99.28% accuracy, signifying a major advancement in churn prediction.
arXiv Detail & Related papers (2024-08-29T06:27:42Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z) - VFed-SSD: Towards Practical Vertical Federated Advertising [53.08038962443853]
We propose a semi-supervised split distillation framework VFed-SSD to alleviate the two limitations.
Specifically, we develop a self-supervised task MatchedPair Detection (MPD) to exploit the vertically partitioned unlabeled data.
Our framework provides an efficient federation-enhanced solution for real-time display advertising with minimal deploying cost and significant performance lift.
arXiv Detail & Related papers (2022-05-31T17:45:30Z) - CAD: Co-Adapting Discriminative Features for Improved Few-Shot
Classification [11.894289991529496]
Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples.
Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning.
We propose a strategy to cross-attend and re-weight discriminative features for few-shot classification.
arXiv Detail & Related papers (2022-03-25T06:14:51Z) - Creating Training Sets via Weak Indirect Supervision [66.77795318313372]
Weak Supervision (WS) frameworks synthesize training labels from multiple potentially noisy supervision sources.
We formulate Weak Indirect Supervision (WIS), a new research problem for automatically synthesizing training labels.
We develop a probabilistic modeling approach, PLRM, which uses user-provided label relations to model and leverage indirect supervision sources.
arXiv Detail & Related papers (2021-10-07T14:09:35Z) - Adaptive Consistency Regularization for Semi-Supervised Transfer
Learning [31.66745229673066]
We consider semi-supervised learning and transfer learning jointly, leading to a more practical and competitive paradigm.
To better exploit the value of both pre-trained weights and unlabeled target examples, we introduce adaptive consistency regularization.
Our proposed adaptive consistency regularization outperforms state-of-the-art semi-supervised learning techniques such as Pseudo Label, Mean Teacher, and MixMatch.
arXiv Detail & Related papers (2021-03-03T05:46:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.