Greedy Algorithm almost Dominates in Smoothed Contextual Bandits
- URL: http://arxiv.org/abs/2005.10624v2
- Date: Mon, 27 Dec 2021 18:30:23 GMT
- Title: Greedy Algorithm almost Dominates in Smoothed Contextual Bandits
- Authors: Manish Raghavan, Aleksandrs Slivkins, Jennifer Wortman Vaughan, Zhiwei
Steven Wu
- Abstract summary: Online learning algorithms must balance exploration and exploitation.
We show that a greedy approach almost matches the best possible Bayesian regret rate of any other algorithm.
- Score: 100.09904315064372
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online learning algorithms, widely used to power search and content
optimization on the web, must balance exploration and exploitation, potentially
sacrificing the experience of current users in order to gain information that
will lead to better decisions in the future. While necessary in the worst case,
explicit exploration has a number of disadvantages compared to the greedy
algorithm that always "exploits" by choosing an action that currently looks
optimal. We ask under what conditions inherent diversity in the data makes
explicit exploration unnecessary. We build on a recent line of work on the
smoothed analysis of the greedy algorithm in the linear contextual bandits
model. We improve on prior results to show that a greedy approach almost
matches the best possible Bayesian regret rate of any other algorithm on the
same problem instance whenever the diversity conditions hold, and that this
regret is at most $\tilde O(T^{1/3})$.
Related papers
- Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial
Corruptions [98.75618795470524]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We propose a new algorithm based on the principle of optimism in the face of uncertainty.
arXiv Detail & Related papers (2022-05-13T17:58:58Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Algorithmic Challenges in Ensuring Fairness at the Time of Decision [6.228560624452748]
Algorithmic decision-making in societal contexts can be framed as optimization under bandit feedback.
Recent lawsuits accuse companies that deploy algorithmic pricing practices of price gouging.
We introduce the well-studied fairness notion of envy-freeness within the context of convex optimization.
arXiv Detail & Related papers (2021-03-16T19:06:28Z) - Double Coverage with Machine-Learned Advice [100.23487145400833]
We study the fundamental online $k$-server problem in a learning-augmented setting.
We show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff.
arXiv Detail & Related papers (2021-03-02T11:04:33Z) - Lenient Regret and Good-Action Identification in Gaussian Process
Bandits [43.03669155559218]
We study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is "good enough"
On the practical side, we consider the problem of finding a single "good action" according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold.
arXiv Detail & Related papers (2021-02-11T01:16:58Z) - Be Greedy in Multi-Armed Bandits [22.301793734117805]
The Greedy algorithm is the simplest in sequential decision problem that takes the locally optimal choice at each round.
We provide a generic worst-case bound on the regret of the Greedy algorithm.
We prove that it verifies near-optimal worst-case regret bounds in continuous, infinite and many-armed bandit problems.
arXiv Detail & Related papers (2021-01-04T16:47:02Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms.
We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.