Correlated Bandits for Dynamic Pricing via the ARC algorithm
- URL: http://arxiv.org/abs/2102.04263v1
- Date: Mon, 8 Feb 2021 14:54:26 GMT
- Title: Correlated Bandits for Dynamic Pricing via the ARC algorithm
- Authors: Samuel Cohen and Tanut Treetanthiploet
- Abstract summary: Asymptotic Randomised Control provides a rigorous approximation to the optimal strategy for a wide class of Bayesian bandits.
It allows a decision maker to observe signals in addition to their rewards, to incorporate correlations between the outcomes of different choices, and to have nontrivial dynamics for their estimates.
- Score: 2.7564955518050693
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Asymptotic Randomised Control (ARC) algorithm provides a rigorous
approximation to the optimal strategy for a wide class of Bayesian bandits,
while retaining reasonable computational complexity. In particular, it allows a
decision maker to observe signals in addition to their rewards, to incorporate
correlations between the outcomes of different choices, and to have nontrivial
dynamics for their estimates. The algorithm is guaranteed to asymptotically
optimise the expected discounted payoff, with error depending on the initial
uncertainty of the bandit. In this paper, we consider a batched bandit problem
where observations arrive from a generalised linear model; we extend the ARC
algorithm to this setting. We apply this to a classic dynamic pricing problem
based on a Bayesian hierarchical model and demonstrate that the ARC algorithm
outperforms alternative approaches.
Related papers
- Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem.
It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling.
We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
arXiv Detail & Related papers (2024-05-24T04:11:58Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - High-dimensional Contextual Bandit Problem without Sparsity [8.782204980889077]
We propose an explore-then-commit (EtC) algorithm to address this problem and examine its performance.
We derive the optimal rate of the ETC algorithm in terms of $T$ and show that this rate can be achieved by balancing exploration and exploitation.
We introduce an adaptive explore-then-commit (AEtC) algorithm that adaptively finds the optimal balance.
arXiv Detail & Related papers (2023-06-19T15:29:32Z) - A Novel Framework for Improving the Breakdown Point of Robust Regression
Algorithms [1.9594639581421422]
We present an effective framework for improving the breakdown point of robust regression algorithms.
We derive a consistent robust regression algorithm with iterative local search (CORALS)
arXiv Detail & Related papers (2023-05-20T15:59:33Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - Robust recovery for stochastic block models [16.74630355427558]
We develop an efficient algorithm for weak recovery in a robust version of the block model.
Our results show that there is no price of robustness in the block model.
arXiv Detail & Related papers (2021-11-16T15:43:00Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - An Efficient Algorithm for Deep Stochastic Contextual Bandits [10.298368632706817]
In contextual bandit problems, an agent selects an action based on certain observed context to maximize the reward over iterations.
Recently there have been a few studies using a deep neural network (DNN) to predict the expected reward for an action, and is trained by a gradient based method.
arXiv Detail & Related papers (2021-04-12T16:34:43Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.