Bandit algorithms to emulate human decision making using probabilistic
distortions
- URL: http://arxiv.org/abs/1611.10283v3
- Date: Tue, 31 Oct 2023 05:53:46 GMT
- Title: Bandit algorithms to emulate human decision making using probabilistic
distortions
- Authors: Ravi Kumar Kolla, Prashanth L.A., Aditya Gopalan, Krishna Jagannathan,
Michael Fu and Steve Marcus
- Abstract summary: We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
- Score: 20.422725678982726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by models of human decision making proposed to explain commonly
observed deviations from conventional expected value preferences, we formulate
two stochastic multi-armed bandit problems with distorted probabilities on the
reward distributions: the classic $K$-armed bandit and the linearly
parameterized bandit settings. We consider the aforementioned problems in the
regret minimization as well as best arm identification framework for
multi-armed bandits. For the regret minimization setting in $K$-armed as well
as linear bandit problems, we propose algorithms that are inspired by Upper
Confidence Bound (UCB) algorithms, incorporate reward distortions, and exhibit
sublinear regret. For the $K$-armed bandit setting, we derive an upper bound on
the expected regret for our proposed algorithm, and then we prove a matching
lower bound to establish the order-optimality of our algorithm. For the
linearly parameterized setting, our algorithm achieves a regret upper bound
that is of the same order as that of regular linear bandit algorithm called
Optimism in the Face of Uncertainty Linear (OFUL) bandit algorithm, and unlike
OFUL, our algorithm handles distortions and an arm-dependent noise model. For
the best arm identification problem in the $K$-armed bandit setting, we propose
algorithms, derive guarantees on their performance, and also show that these
algorithms are order optimal by proving matching fundamental limits on
performance. For best arm identification in linear bandits, we propose an
algorithm and establish sample complexity guarantees. Finally, we present
simulation experiments which demonstrate the advantages resulting from using
distortion-aware learning algorithms in a vehicular traffic routing
application.
Related papers
- Replicability is Asymptotically Free in Multi-armed Bandits [45.729105054410745]
This work is motivated by the growing demand for reproducible machine learning.
In particular, we consider a replicable algorithm that ensures, with high probability, that the algorithm's sequence of actions is not affected by the randomness inherent in the dataset.
arXiv Detail & Related papers (2024-02-12T03:31:34Z) - Versatile Dueling Bandits: Best-of-both-World Analyses for Online
Learning from Preferences [28.79598714109439]
We study the problem of $K$-armed dueling bandit for both and adversarial environments.
We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits.
Our algorithm is also the first to achieve an optimal $O(sum_i = 1K fraclog TDelta_i)$ regret bound against the Condorcet-winner benchmark.
arXiv Detail & Related papers (2022-02-14T13:37:23Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave.
Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality.
We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits.
We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward.
We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.