Related papers: Ranking by Lifts: A Cost-Benefit Approach to Large-Scale A/B Tests

Related papers

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization [60.87651283510059]
Group Relative Policy Optimization (GRPO) effectively scales LLM reasoning but incurs prohibitive computational costs.<n>We propose Dynamic Pruning Policy Optimization (DPPO), a framework that enables dynamic pruning while preserving unbiased gradient estimation.<n>To mitigate the data sparsity induced by pruning, we introduce Dense Prompt Packing, a window-based greedy strategy.
arXiv Detail & Related papers (2026-03-04T14:48:53Z)
ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference [60.958331943869126]
ODAR-Expert is an adaptive routing framework that optimize the accuracy-efficiency trade-off via principled resource allocation.<n>We show strong and consistent gains, including 98.2% accuracy on MATH and 54.8% on Humanity's Last Exam.
arXiv Detail & Related papers (2026-02-27T05:22:01Z)
Enhanced-FQL($λ$), an Efficient and Interpretable RL with novel Fuzzy Eligibility Traces and Segmented Experience Replay [0.0]
Enhanced-FQL($$) achieves superior sample efficiency and reduced variance compared to n-step fuzzy TD and fuzzyA($$) baselines.<n>The framework's inherent interpretability, combined with its computational efficiency and theoretical convergence guarantees, makes it suitable for safety-critical applications.
arXiv Detail & Related papers (2026-01-07T20:59:18Z)
Breaking Determinism: Stochastic Modeling for Reliable Off-Policy Evaluation in Ad Auctions [16.315158617837646]
This work contributes the first practical and validated framework for reliable Off-Policy Evaluation (OPE) in deterministic auction environments.<n>We introduce the first principled framework for OPE in deterministic auctions by repurposing the bid landscape model to approximate the propensity score.<n>We validate our approach on the AuctionNet simulation benchmark and against 2-weeks online A/B test from a large-scale industrial platform.
arXiv Detail & Related papers (2025-12-03T01:37:42Z)
Profit over Proxies: A Scalable Bayesian Decision Framework for Optimizing Multi-Variant Online Experiments [0.0352925259310339]
Online controlled experiments (A/B tests) are fundamental to data-driven decision-making in the digital economy.<n>"p-value" inflates false positive rates, and an over-reliance on proxy metrics like conversion rates can lead to decisions that inadvertently harm core business profitability.<n>This paper introduces a comprehensive and scalable Bayesian decision framework designed for profit optimization in multi-variant (A/B/n) experiments.
arXiv Detail & Related papers (2025-09-16T02:24:20Z)
Reward-Shifted Speculative Sampling Is An Efficient Test-Time Weak-to-Strong Aligner [24.152878302325508]
We introduce the reward-shifted speculative sampling (SSS) algorithm, in which the draft model is aligned with human preferences, while the target model remains unchanged.<n>Our algorithm achieves superior gold reward scores at a significantly reduced inference cost in test-time weak-to-strong alignment experiments.
arXiv Detail & Related papers (2025-08-20T20:10:56Z)
COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z)
Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.<n>We show that the widely used beam search method suffers from unacceptable over-optimism.<n>We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z)
$t$-Testing the Waters: Empirically Validating Assumptions for Reliable A/B-Testing [3.988614978933934]
A/B-tests are a cornerstone of experimental design on the web, with wide-ranging applications and use-cases. We propose a practical method to test whether the $t$-test's assumptions are met, and the A/B-test is valid. This provides an efficient and effective way to empirically assess whether the $t$-test's assumptions are met, and the A/B-test is valid.
arXiv Detail & Related papers (2025-02-07T09:55:24Z)
An Upper Confidence Bound Approach to Estimating the Maximum Mean [0.0]
We study estimation of the maximum mean using an upper confidence bound (UCB) approach. We establish statistical guarantees, including strong consistency, mean squared errors, and central limit theorems (CLTs) for both estimators.
arXiv Detail & Related papers (2024-08-08T02:53:09Z)
Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric. We propose a single framework built on their equivalence in learning scenarios. Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z)
Efficient Weighting Schemes for Auditing Instant-Runoff Voting Elections [57.67176250198289]
AWAIRE involves adaptively weighted averages of test statistics, essentially "learning" an effective set of hypotheses to test. We explore schemes and settings more extensively, to identify and recommend efficient choices for practice. A limitation of the current AWAIRE implementation is its restriction to a small number of candidates.
arXiv Detail & Related papers (2024-02-18T10:13:01Z)
Boosting Fair Classifier Generalization through Adaptive Priority Reweighing [59.801444556074394]
A performance-promising fair algorithm with better generalizability is needed. This paper proposes a novel adaptive reweighing method to eliminate the impact of the distribution shifts between training and test data on model generalizability.
arXiv Detail & Related papers (2023-09-15T13:04:55Z)
A Semi-Bayesian Nonparametric Estimator of the Maximum Mean Discrepancy Measure: Applications in Goodness-of-Fit Testing and Generative Adversarial Networks [3.623570119514559]
We propose a semi-Bayesian nonparametric (semi-BNP) procedure for the goodness-of-fit (GOF) test. Our method introduces a novel Bayesian estimator for the maximum mean discrepancy (MMD) measure. We demonstrate that our proposed test outperforms frequentist MMD-based methods by achieving a lower false rejection and acceptance rate of the null hypothesis.
arXiv Detail & Related papers (2023-03-05T10:36:21Z)
Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT. We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z)
Optimal Treatment Regimes for Proximal Causal Learning [7.672587258250301]
We propose a novel optimal individualized treatment regime based on outcome and treatment confounding bridges. We show that the value function of this new optimal treatment regime is superior to that of existing ones in the literature.
arXiv Detail & Related papers (2022-12-19T14:29:25Z)
Empirical Bayesian Approaches for Robust Constraint-based Causal Discovery under Insufficient Data [38.883810061897094]
Causal discovery methods assume data sufficiency, which may not be the case in many real world datasets. We propose Bayesian-augmented frequentist independence tests to improve the performance of constraint-based causal discovery methods under insufficient data. Experiments show significant performance improvement in terms of both accuracy and efficiency over SOTA methods.
arXiv Detail & Related papers (2022-06-16T21:08:49Z)
Error-based Knockoffs Inference for Controlled Feature Selection [49.99321384855201]
We propose an error-based knockoff inference method by integrating the knockoff features, the error-based feature importance statistics, and the stepdown procedure together. The proposed inference procedure does not require specifying a regression model and can handle feature selection with theoretical guarantees.
arXiv Detail & Related papers (2022-03-09T01:55:59Z)
Assessment of Treatment Effect Estimators for Heavy-Tailed Data [70.72363097550483]
A central obstacle in the objective assessment of treatment effect (TE) estimators in randomized control trials (RCTs) is the lack of ground truth (or validation set) to test their performance. We provide a novel cross-validation-like methodology to address this challenge. We evaluate our methodology across 709 RCTs implemented in the Amazon supply chain.
arXiv Detail & Related papers (2021-12-14T17:53:01Z)
Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling. We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z)
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function. We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
Understanding and Mitigating the Limitations of Prioritized Experience Replay [46.663239542920984]
Prioritized Replay Experience (ER) has been empirically shown to improve sample efficiency across many domains. We show equivalence between the error-based prioritized sampling method for mean squared error and uniform sampling for cubic power loss. We then provide theoretical insight into why it improves convergence rate upon uniform sampling during early learning.
arXiv Detail & Related papers (2020-07-19T03:10:02Z)
Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually. Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)
Optimal Bidding Strategy without Exploration in Real-time Bidding [14.035270361462576]
maximizing utility with a budget constraint is the primary goal for advertisers in real-time bidding (RTB) systems. Previous works ignore the losing auctions to alleviate the difficulty with censored states. We propose a novel practical framework using the maximum entropy principle to imitate the behavior of the true distribution observed in real-time traffic.
arXiv Detail & Related papers (2020-03-31T20:43:28Z)
PAPRIKA: Private Online False Discovery Rate Control [27.698099204682105]
We study False Discovery Rate (FDR) control in hypothesis testing under the constraint of differential privacy for the sample. We provide new private algorithms based on state-of-the-art results in non-private online FDR control.
arXiv Detail & Related papers (2020-02-27T18:42:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.