Experimentation Platforms Meet Reinforcement Learning: Bayesian
Sequential Decision-Making for Continuous Monitoring
- URL: http://arxiv.org/abs/2304.00420v1
- Date: Sun, 2 Apr 2023 00:59:10 GMT
- Title: Experimentation Platforms Meet Reinforcement Learning: Bayesian
Sequential Decision-Making for Continuous Monitoring
- Authors: Runzhe Wan, Yu Liu, James McQueen, Doug Hains, Rui Song
- Abstract summary: In this paper, we introduce a novel framework that we developed in Amazon to maximize customer experience and control opportunity cost.
We formulate the problem as a Bayesian optimal sequential decision making problem that has a unified utility function.
We show the effectiveness of this novel approach compared with existing methods via a large-scale meta-analysis on experiments in Amazon.
- Score: 13.62951379287041
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growing needs of online A/B testing to support the innovation in
industry, the opportunity cost of running an experiment becomes non-negligible.
Therefore, there is an increasing demand for an efficient continuous monitoring
service that allows early stopping when appropriate. Classic statistical
methods focus on hypothesis testing and are mostly developed for traditional
high-stake problems such as clinical trials, while experiments at online
service companies typically have very different features and focuses. Motivated
by the real needs, in this paper, we introduce a novel framework that we
developed in Amazon to maximize customer experience and control opportunity
cost. We formulate the problem as a Bayesian optimal sequential decision making
problem that has a unified utility function. We discuss extensively practical
design choices and considerations. We further introduce how to solve the
optimal decision rule via Reinforcement Learning and scale the solution. We
show the effectiveness of this novel approach compared with existing methods
via a large-scale meta-analysis on experiments in Amazon.
Related papers
- Can Learned Optimization Make Reinforcement Learning Less Difficult? [70.5036361852812]
We consider whether learned optimization can help overcome reinforcement learning difficulties.
Our method, Learned Optimization for Plasticity, Exploration and Non-stationarity (OPEN), meta-learns an update rule whose input features and output structure are informed by previously proposed to these difficulties.
arXiv Detail & Related papers (2024-07-09T17:55:23Z) - Adaptive Experimentation When You Can't Experiment [55.86593195947978]
This paper introduces the emphconfounded pure exploration transductive linear bandit (textttCPET-LB) problem.
Online services can employ a properly randomized encouragement that incentivizes users toward a specific treatment.
arXiv Detail & Related papers (2024-06-15T20:54:48Z) - Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - Efficient Real-world Testing of Causal Decision Making via Bayesian
Experimental Design for Contextual Optimisation [12.37745209793872]
We introduce a model-agnostic framework for gathering data to evaluate and improve contextual decision making.
Our method is used for the data-efficient evaluation of the regret of past treatment assignments.
arXiv Detail & Related papers (2022-07-12T01:20:11Z) - Active Exploration via Experiment Design in Markov Chains [86.41407938210193]
A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest.
We propose an algorithm that efficiently selects policies whose measurement allocation converges to the optimal one.
In addition to our theoretical analysis, we showcase our framework on applications in ecological surveillance and pharmacology.
arXiv Detail & Related papers (2022-06-29T00:04:40Z) - Towards the D-Optimal Online Experiment Design for Recommender Selection [18.204325860752768]
Finding the optimal online experiment is nontrivial since both the users and displayed recommendations carry contextual features that are informative to the reward.
We leverage the emphD-optimal design from the classical statistics literature to achieve the maximum information gain during exploration.
We then use our deployment example on Walmart.com to fully illustrate the practical insights and effectiveness of the proposed methods.
arXiv Detail & Related papers (2021-10-23T04:30:27Z) - Diffusion Approximations for a Class of Sequential Testing Problems [0.0]
We study the problem of a seller who wants to select an optimal assortment of products to launch into the marketplace.
Motivated by emerging practices in e-commerce, we assume that the seller is able to use a crowdvoting system to learn these preferences.
arXiv Detail & Related papers (2021-02-13T23:21:29Z) - Learning the Truth From Only One Side of the Story [58.65439277460011]
We focus on generalized linear models and show that without adjusting for this sampling bias, the model may converge suboptimally or even fail to converge to the optimal solution.
We propose an adaptive approach that comes with theoretical guarantees and show that it outperforms several existing methods empirically.
arXiv Detail & Related papers (2020-06-08T18:20:28Z) - Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement
Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries.
This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.