Tractable contextual bandits beyond realizability
- URL: http://arxiv.org/abs/2010.13013v2
- Date: Fri, 26 Feb 2021 00:08:36 GMT
- Title: Tractable contextual bandits beyond realizability
- Authors: Sanath Kumar Krishnamurthy, Vitor Hadad, and Susan Athey
- Abstract summary: We present a tractable bandit algorithm that is not sensitive to the realizability assumption.
Our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability.
- Score: 15.40239357726009
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tractable contextual bandit algorithms often rely on the realizability
assumption - i.e., that the true expected reward model belongs to a known
class, such as linear functions. In this work, we present a tractable bandit
algorithm that is not sensitive to the realizability assumption and
computationally reduces to solving a constrained regression problem in every
epoch. When realizability does not hold, our algorithm ensures the same
guarantees on regret achieved by realizability-based algorithms under
realizability, up to an additive term that accounts for the misspecification
error. This extra term is proportional to T times a function of the mean
squared error between the best model in the class and the true model, where T
is the total number of time-steps. Our work sheds light on the bias-variance
trade-off for tractable contextual bandits. This trade-off is not captured by
algorithms that assume realizability, since under this assumption there exists
an estimator in the class that attains zero bias.
Related papers
- Efficient and Interpretable Bandit Algorithms [18.99853072645046]
A bandit algorithm is interpretable if it explores with the objective of reducing uncertainty in the unknown model parameter.
We propose CODE, a bandit algorithm based on a Constrained Optimal DEsign, that is interpretable and maximally reduces the uncertainty.
arXiv Detail & Related papers (2023-10-23T09:36:13Z) - Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
Contextual Bandits and Markov Decision Processes [59.61248760134937]
We propose an efficient algorithm to achieve a regret of $tildeO(sqrtT+zeta)$.
The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit.
We generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $zeta$.
arXiv Detail & Related papers (2022-12-12T15:04:56Z) - ReLU Regression with Massart Noise [52.10842036932169]
We study the fundamental problem of ReLU regression, where the goal is to fit Rectified Linear Units (ReLUs) to data.
We focus on ReLU regression in the Massart noise model, a natural and well-studied semi-random noise model.
We develop an efficient algorithm that achieves exact parameter recovery in this model.
arXiv Detail & Related papers (2021-09-10T02:13:22Z) - Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware
Regression [91.3373131262391]
Uncertainty is the only certainty there is.
Traditionally, the direct regression formulation is considered and the uncertainty is modeled by modifying the output space to a certain family of probabilistic distributions.
How to model the uncertainty within the present-day technologies for regression remains an open issue.
arXiv Detail & Related papers (2021-03-25T06:56:09Z) - Adapting to misspecification in contextual bandits with offline
regression oracles [7.312170216336086]
We propose a family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy.
Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level.
arXiv Detail & Related papers (2021-02-26T00:15:04Z) - Don't Just Blame Over-parametrization for Over-confidence: Theoretical
Analysis of Calibration in Binary Classification [58.03725169462616]
We show theoretically that over-parametrization is not the only reason for over-confidence.
We prove that logistic regression is inherently over-confident, in the realizable, under-parametrized setting.
Perhaps surprisingly, we also show that over-confidence is not always the case.
arXiv Detail & Related papers (2021-02-15T21:38:09Z) - Bandit algorithms: Letting go of logarithmic regret for statistical
robustness [0.0]
We study regret in a multi-armed bandit setting and establish a fundamental trade-off between the regret suffered under an algorithm, and its statistical robustness.
We show that bandit learning algorithms with logarithmic regret are always inconsistent and that consistent learning algorithms always suffer a super-logarithmic regret.
arXiv Detail & Related papers (2020-06-22T07:18:47Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z) - Provable Training of a ReLU Gate with an Iterative Non-Gradient
Algorithm [0.7614628596146599]
We show provable guarantees on the training of a single ReLU gate in hitherto unexplored regimes.
We show a first-of-its-kind approximate recovery of the true label generating parameters under an (online) data-poisoning attack on the true labels.
Our guarantee is shown to be nearly optimal in the worst case and its accuracy of recovering the true weight degrades gracefully with increasing probability of attack and its magnitude.
arXiv Detail & Related papers (2020-05-08T17:59:23Z) - Being Bayesian about Categorical Probability [6.875312133832079]
We consider a random variable of a categorical probability over class labels.
In this framework, the prior distribution explicitly models the presumed noise inherent in the observed label.
Our method can be implemented as a plug-and-play loss function with negligible computational overhead.
arXiv Detail & Related papers (2020-02-19T02:35:32Z) - Beyond UCB: Optimal and Efficient Contextual Bandits with Regression
Oracles [112.89548995091182]
We provide the first universal and optimal reduction from contextual bandits to online regression.
Our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
arXiv Detail & Related papers (2020-02-12T11:33:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.