CaliCausalRank: Calibrated Multi-Objective Ad Ranking with Robust Counterfactual Utility Optimization
- URL: http://arxiv.org/abs/2602.18786v1
- Date: Sat, 21 Feb 2026 10:35:12 GMT
- Title: CaliCausalRank: Calibrated Multi-Objective Ad Ranking with Robust Counterfactual Utility Optimization
- Authors: Xikai Yang, Sebastian Sun, Yilin Li, Yue Xing, Ming Wang, Yang Wang,
- Abstract summary: CaliCausalRank is a framework that integrates training-time scale calibration, constraint-based multi-objective optimization, and robust counterfactual utility estimation.<n>Our approach treats score calibration as a first-class training objective rather than post-hoc processing, employs Lagrangian relaxation for constraint satisfaction, and achieves variance-reduced counterfactual estimators for reliable offline evaluation.
- Score: 9.601427882648116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ad ranking systems must simultaneously optimize multiple objectives including click-through rate (CTR), conversion rate (CVR), revenue, and user experience metrics. However, production systems face critical challenges: score scale inconsistency across traffic segments undermines threshold transferability, and position bias in click logs causes offline-online metric discrepancies. We propose CaliCausalRank, a unified framework that integrates training-time scale calibration, constraint-based multi-objective optimization, and robust counterfactual utility estimation. Our approach treats score calibration as a first-class training objective rather than post-hoc processing, employs Lagrangian relaxation for constraint satisfaction, and utilizes variance-reduced counterfactual estimators for reliable offline evaluation. Experiments on the Criteo and Avazu datasets demonstrate that CaliCausalRank achieves 1.1% relative AUC improvement, 31.6% calibration error reduction, and 3.2% utility gain compared to the best baseline (PairRank) while maintaining consistent performance across different traffic segments.
Related papers
- RGAlign-Rec: Ranking-Guided Alignment for Latent Query Reasoning in Recommendation Systems [25.34524038198569]
We propose RGAlign-Rec, a closed-loop alignment framework for proactive intent prediction.<n>We also introduce Ranking-Guided Alignment (RGA), a multi-stage training paradigm.<n>Our framework achieves a 0.12% gain in GAUC, leading to a significant 3.52% relative reduction in error rate, and a 0.56% improvement in Recall@3.
arXiv Detail & Related papers (2026-02-13T14:38:02Z) - CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training [73.46600457802693]
We introduce a new method that counteracts the loss induced by quantization.<n>CAGE significantly improves upon the state-of-theart methods in terms of accuracy, for similar computational cost.<n>For QAT pre-training of Llama models, CAGE matches the accuracy achieved at 4-bits (W4A4) with the prior best method.
arXiv Detail & Related papers (2025-10-21T16:33:57Z) - Continual Action Quality Assessment via Adaptive Manifold-Aligned Graph Regularization [53.82400605816587]
Action Quality Assessment (AQA) quantifies human actions in videos, supporting applications in sports scoring, rehabilitation, and skill evaluation.<n>A major challenge lies in the non-stationary nature of quality distributions in real-world scenarios.<n>We introduce Continual AQA (CAQA), which equips with Continual Learning capabilities to handle evolving distributions.
arXiv Detail & Related papers (2025-10-08T10:09:47Z) - UniCBE: An Uniformity-driven Comparing Based Evaluation Framework with Unified Multi-Objective Optimization [19.673388630963807]
We propose UniCBE, a unified uniformity-driven CBE framework.<n>On the AlpacaEval benchmark, UniCBE saves over 17% of evaluation budgets while achieving a Pearson correlation with ground truth exceeding 0.995.<n>In scenarios where new models are continuously introduced, UniCBE can even save over 50% of evaluation costs.
arXiv Detail & Related papers (2025-02-17T05:28:12Z) - Consistency Calibration: Improving Uncertainty Calibration via Consistency among Perturbed Neighbors [22.39558434131574]
We introduce the concept of consistency as an alternative perspective on model calibration.
We propose a post-hoc calibration method called Consistency (CC) which adjusts confidence based on the model's consistency across inputs.
We show that performing perturbations at the logit level significantly improves computational efficiency.
arXiv Detail & Related papers (2024-10-16T06:55:02Z) - A Self-boosted Framework for Calibrated Ranking [7.4291851609176645]
Calibrated Ranking is a scale-calibrated ranking system that pursues accurate ranking quality and calibrated probabilistic predictions simultaneously.
Previous methods need to aggregate the full candidate list within a single mini-batch to compute the ranking loss.
We propose a Self-Boosted framework for Calibrated Ranking (SBCR)
arXiv Detail & Related papers (2024-06-12T09:00:49Z) - Lower-Left Partial AUC: An Effective and Efficient Optimization Metric
for Recommendation [52.45394284415614]
We propose a new optimization metric, Lower-Left Partial AUC (LLPAUC), which is computationally efficient like AUC but strongly correlates with Top-K ranking metrics.
LLPAUC considers only the partial area under the ROC curve in the Lower-Left corner to push the optimization focus on Top-K.
arXiv Detail & Related papers (2024-02-29T13:58:33Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Threshold-Consistent Margin Loss for Open-World Deep Metric Learning [42.03620337000911]
Existing losses used in deep metric learning (DML) for image retrieval often lead to highly non-uniform intra-class and inter-class representation structures.
Inconsistency often complicates the threshold selection process when deploying commercial image retrieval systems.
We propose a novel variance-based metric called Operating-Point-Inconsistency-Score (OPIS) that quantifies the variance in the operating characteristics across classes.
arXiv Detail & Related papers (2023-07-08T21:16:41Z) - Joint Optimization of Ranking and Calibration with Contextualized Hybrid
Model [24.66016187602343]
We propose an approach that can Jointly optimize the Ranking and abilities (JRC) for short.
JRC improves the ranking ability by contrasting the logit value for the sample with different labels and constrains the predicted probability to be a function of the logit subtraction.
JRC has been deployed on the display advertising platform of Alibaba and has obtained significant performance improvements.
arXiv Detail & Related papers (2022-08-12T08:32:13Z) - Reduced Reference Perceptual Quality Model and Application to Rate
Control for 3D Point Cloud Compression [61.110938359555895]
In rate-distortion optimization, the encoder settings are determined by maximizing a reconstruction quality measure subject to a constraint on the bit rate.
We propose a linear perceptual quality model whose variables are the V-PCC geometry and color quantization parameters.
Subjective quality tests with 400 compressed 3D point clouds show that the proposed model correlates well with the mean opinion score.
We show that for the same target bit rate, ratedistortion optimization based on the proposed model offers higher perceptual quality than rate-distortion optimization based on exhaustive search with a point-to-point objective quality metric.
arXiv Detail & Related papers (2020-11-25T12:42:02Z) - Re-Assessing the "Classify and Count" Quantification Method [88.60021378715636]
"Classify and Count" (CC) is often a biased estimator.
Previous works have failed to use properly optimised versions of CC.
We argue that, while still inferior to some cutting-edge methods, they deliver near-state-of-the-art accuracy.
arXiv Detail & Related papers (2020-11-04T21:47:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.