Profit over Proxies: A Scalable Bayesian Decision Framework for Optimizing Multi-Variant Online Experiments
- URL: http://arxiv.org/abs/2509.22677v1
- Date: Tue, 16 Sep 2025 02:24:20 GMT
- Title: Profit over Proxies: A Scalable Bayesian Decision Framework for Optimizing Multi-Variant Online Experiments
- Authors: Srijesh Pillai, Rajesh Kumar Chandrawat,
- Abstract summary: Online controlled experiments (A/B tests) are fundamental to data-driven decision-making in the digital economy.<n>"p-value" inflates false positive rates, and an over-reliance on proxy metrics like conversion rates can lead to decisions that inadvertently harm core business profitability.<n>This paper introduces a comprehensive and scalable Bayesian decision framework designed for profit optimization in multi-variant (A/B/n) experiments.
- Score: 0.0352925259310339
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online controlled experiments (A/B tests) are fundamental to data-driven decision-making in the digital economy. However, their real-world application is frequently compromised by two critical shortcomings: the use of statistically flawed heuristics like "p-value peeking", which inflates false positive rates, and an over-reliance on proxy metrics like conversion rates, which can lead to decisions that inadvertently harm core business profitability. This paper addresses these challenges by introducing a comprehensive and scalable Bayesian decision framework designed for profit optimization in multi-variant (A/B/n) experiments. We propose a hierarchical Bayesian model that simultaneously estimates the probability of conversion (using a Beta-Bernoulli model) and the monetary value of that conversion (using a robust Bayesian model for the mean transaction value). Building on this, we employ a decision-theoretic stopping rule based on Expected Loss, enabling experiments to be concluded not only when a superior variant is identified but also when it becomes clear that no variant offers a practically significant improvement (stopping for futility). The framework successfully navigates "revenue traps" where a variant with a higher conversion rate would have resulted in a net financial loss, correctly terminates futile experiments early to conserve resources, and maintains strict statistical integrity throughout the monitoring process. Ultimately, this work provides a practical and principled methodology for organizations to move beyond simple A/B testing towards a mature, profit-driven experimentation culture, ensuring that statistical conclusions translate directly to strategic business value.
Related papers
- Observationally Informed Adaptive Causal Experimental Design [55.998153710215654]
We propose Active Residual Learning, a new paradigm that leverages the observational model as a foundational prior.<n>This approach shifts the experimental focus from learning target causal quantities from scratch to efficiently estimating the residuals required to correct observational bias.<n> Experiments on synthetic and semi-synthetic benchmarks demonstrate that R-Design significantly outperforms baselines.
arXiv Detail & Related papers (2026-03-04T06:52:37Z) - Case-Guided Sequential Assay Planning in Drug Discovery [2.8529443025686487]
Implicit Bayesian Markov Decision Process (IBMDP) is a model-based RL framework designed for simulator-free settings.<n>IBMDP generates stable policies that balance information gain toward desired outcomes with resource efficiency.<n>On a real-world central nervous system (CNS) drug discovery task, IBMDP reduced resource consumption by up to 92% compared to establisheds.
arXiv Detail & Related papers (2026-01-21T06:58:01Z) - Breaking Determinism: Stochastic Modeling for Reliable Off-Policy Evaluation in Ad Auctions [16.315158617837646]
This work contributes the first practical and validated framework for reliable Off-Policy Evaluation (OPE) in deterministic auction environments.<n>We introduce the first principled framework for OPE in deterministic auctions by repurposing the bid landscape model to approximate the propensity score.<n>We validate our approach on the AuctionNet simulation benchmark and against 2-weeks online A/B test from a large-scale industrial platform.
arXiv Detail & Related papers (2025-12-03T01:37:42Z) - ZIP-RC: Optimizing Test-Time Compute via Zero-Overhead Joint Reward-Cost Prediction [57.799425838564]
We present ZIP-RC, an adaptive inference method that equips models with zero-overhead inference-time predictions of reward and cost.<n> ZIP-RC improves accuracy by up to 12% over majority voting at equal or lower average cost.
arXiv Detail & Related papers (2025-12-01T09:44:31Z) - Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability [14.00844847268286]
Early-Exit Deep Neural Networks enable adaptive inference by allowing prediction at intermediary layers.<n>Our framework demonstrates consistent improvements in speedup (1.70-2.10x) with a minimal performance drop (2%) as compared to full model performance.
arXiv Detail & Related papers (2025-09-28T06:05:24Z) - T-TAMER: Provably Taming Trade-offs in ML Serving [32.526955555483354]
We present a general framework, T-Tamer, which formalizes this setting as a multi-stage decision process.<n>Our main result shows that recall is both necessary and sufficient for achieving provable performance guarantees.<n>The results show that recall-based strategies consistently yield efficient accuracy-latency trade-offs.
arXiv Detail & Related papers (2025-09-26T23:08:03Z) - Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs [71.7892165868749]
Commercial Large Language Model (LLM) APIs create a fundamental trust problem.<n>Users pay for specific models but have no guarantee that providers deliver them faithfully.<n>We formalize this model substitution problem and evaluate detection methods under realistic adversarial conditions.<n>We propose and evaluate the use of Trusted Execution Environments (TEEs) as one practical and robust solution.
arXiv Detail & Related papers (2025-04-07T03:57:41Z) - Regression-Based Estimation of Causal Effects in the Presence of Selection Bias and Confounding [52.1068936424622]
We consider the problem of estimating the expected causal effect $E[Y|do(X)]$ for a target variable $Y$ when treatment $X$ is set by intervention.<n>In settings without selection bias or confounding, $E[Y|do(X)] = E[Y|X]$, which can be estimated using standard regression methods.<n>We propose a framework that incorporates both selection bias and confounding.
arXiv Detail & Related papers (2025-03-26T13:43:37Z) - FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting [58.70072722290475]
Financial time series (FinTS) record the behavior of human-brain-augmented decision-making.<n>FinTSB is a comprehensive and practical benchmark for financial time series forecasting.
arXiv Detail & Related papers (2025-02-26T05:19:16Z) - Ranking by Lifts: A Cost-Benefit Approach to Large-Scale A/B Tests [0.0]
A/B testing is a core tool for decision-making in business experimentation, particularly in digital platforms and marketplaces.<n>This paper develops a decision-theoretic framework for maximizing expected profit subject to a constraint on the cost-weighted false discovery rate (FDR)<n>We propose an empirical Bayes approach that uses a greedy knapsack algorithm to rank experiments based on the ratio of expected lift to cost, incorporating the local false discovery rate (lfdr) as a key statistic.
arXiv Detail & Related papers (2024-07-01T07:40:08Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.<n>We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.<n>Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Detecting Toxic Flow [0.40964539027092917]
This paper develops a framework to predict toxic trades that a broker receives from her clients.
We use a proprietary dataset of foreign exchange transactions to test our methodology.
We devise a strategy for the broker who uses toxicity predictions to internalise or to externalise each trade received from her clients.
arXiv Detail & Related papers (2023-12-10T09:00:09Z) - Test-time Batch Statistics Calibration for Covariate Shift [66.7044675981449]
We propose to adapt the deep models to the novel environment during inference.
We present a general formulation $alpha$-BN to calibrate the batch statistics.
We also present a novel loss function to form a unified test time adaptation framework Core.
arXiv Detail & Related papers (2021-10-06T08:45:03Z) - Financial Data Analysis Using Expert Bayesian Framework For Bankruptcy
Prediction [0.0]
We propose another route of generative modeling using Expert Bayesian framework.
The biggest advantage of the proposed framework is an explicit inclusion of expert judgment in the modeling process.
The proposed approach is well suited for highly regulated or safety critical applications such as in finance or in medical diagnosis.
arXiv Detail & Related papers (2020-10-19T19:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.