Balancing Risk and Reward: An Automated Phased Release Strategy
- URL: http://arxiv.org/abs/2305.09626v1
- Date: Tue, 16 May 2023 17:27:34 GMT
- Title: Balancing Risk and Reward: An Automated Phased Release Strategy
- Authors: Yufan Li, Jialiang Mao, Iavor Bojinov
- Abstract summary: Phased releases are a common strategy in the technology industry for gradually releasing new products or updates through a sequence of A/B tests.
We propose an algorithm that automatically determines the release percentage at each stage in the schedule, balancing the need to control risk while maximizing ramp-up speed.
- Score: 2.1485350418225244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Phased releases are a common strategy in the technology industry for
gradually releasing new products or updates through a sequence of A/B tests in
which the number of treated units gradually grows until full deployment or
deprecation. Performing phased releases in a principled way requires selecting
the proportion of units assigned to the new release in a way that balances the
risk of an adverse effect with the need to iterate and learn from the
experiment rapidly. In this paper, we formalize this problem and propose an
algorithm that automatically determines the release percentage at each stage in
the schedule, balancing the need to control risk while maximizing ramp-up
speed. Our framework models the challenge as a constrained batched bandit
problem that ensures that our pre-specified experimental budget is not depleted
with high probability. Our proposed algorithm leverages an adaptive Bayesian
approach in which the maximal number of units assigned to the treatment is
determined by the posterior distribution, ensuring that the probability of
depleting the remaining budget is low. Notably, our approach analytically
solves the ramp sizes by inverting probability bounds, eliminating the need for
challenging rare-event Monte Carlo simulation. It only requires computing means
and variances of outcome subsets, making it highly efficient and
parallelizable.
Related papers
- Asynchronous Fractional Multi-Agent Deep Reinforcement Learning for Age-Minimal Mobile Edge Computing [14.260646140460187]
We study the timeliness of computational-intensive updates and explore jointly optimize the task updating and offloading policies to minimize AoI.
Specifically, we consider edge load dynamics and formulate a task scheduling problem to minimize the expected time-average AoI.
Our proposed algorithms reduce the average AoI by up to 52.6% compared with the best baseline algorithm in our experiments.
arXiv Detail & Related papers (2024-09-25T11:33:32Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - STEEL: Singularity-aware Reinforcement Learning [14.424199399139804]
Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy.
We propose a new batch RL algorithm that allows for singularity for both state and action spaces.
By leveraging the idea of pessimism and under some technical conditions, we derive a first finite-sample regret guarantee for our proposed algorithm.
arXiv Detail & Related papers (2023-01-30T18:29:35Z) - Fully Stochastic Trust-Region Sequential Quadratic Programming for
Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints.
The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z) - A Deep Reinforcement Learning Approach to Rare Event Estimation [30.670114229970526]
An important step in the design of autonomous systems is to evaluate the probability that a failure will occur.
In safety-critical domains, the failure probability is extremely small so that the evaluation of a policy through Monte Carlo sampling is inefficient.
We develop two adaptive importance sampling algorithms that can efficiently estimate the probability of rare events for sequential decision making systems.
arXiv Detail & Related papers (2022-11-22T18:29:14Z) - Safe Exploration Incurs Nearly No Additional Sample Complexity for
Reward-free RL [43.672794342894946]
Reward-free reinforcement learning (RF-RL) relies on random action-taking to explore the unknown environment without any reward feedback information.
It remains unclear how such safe exploration requirement would affect the corresponding sample complexity in order to achieve the desired optimality of the obtained policy in planning.
We propose a unified Safe reWard-frEe ExploraTion (SWEET) framework, and develop algorithms coined Tabular-SWEET and Low-rank-SWEET, respectively.
arXiv Detail & Related papers (2022-06-28T15:00:45Z) - Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via
Online High-Confidence Change-Point Detection [7.685002911021767]
We introduce an algorithm that efficiently learns policies in non-stationary environments.
It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics.
We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses.
arXiv Detail & Related papers (2021-05-20T01:57:52Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.