Related papers: Tractable contextual bandits beyond realizability

Tractable contextual bandits beyond realizability

URL: http://arxiv.org/abs/2010.13013v2
Date: Fri, 26 Feb 2021 00:08:36 GMT
Title: Tractable contextual bandits beyond realizability
Authors: Sanath Kumar Krishnamurthy, Vitor Hadad, and Susan Athey
Abstract summary: We present a tractable bandit algorithm that is not sensitive to the realizability assumption. Our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability.
Score: 15.40239357726009
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tractable contextual bandit algorithms often rely on the realizability assumption - i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does not hold, our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability, up to an additive term that accounts for the misspecification error. This extra term is proportional to T times a function of the mean squared error between the best model in the class and the true model, where T is the total number of time-steps. Our work sheds light on the bias-variance trade-off for tractable contextual bandits. This trade-off is not captured by algorithms that assume realizability, since under this assumption there exists an estimator in the class that attains zero bias.

Related papers

Single Index Bandits: Generalized Linear Contextual Bandits with Unknown Reward Functions [8.48717433940334]
We introduce a new problem of generalized linear bandits with unknown reward functions, also known as single index bandits.<n>We first consider the case where the unknown reward function is monotonically increasing, and propose two novel and efficient algorithms, STOR and ESTOR.<n>We then extend our methods to the high-dimensional sparse setting and show that the same regret rate can be attained with the sparsity index.
arXiv Detail & Related papers (2025-06-15T07:19:00Z)
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z)
Active Learning via Regression Beyond Realizability [7.544720605294129]
We present a new active learning framework for classification based on surrogate learning surrogate that operates beyond the real riskability assumption.<n>We show that our new active learning framework is more complex than existing active learning algorithms.
arXiv Detail & Related papers (2025-05-31T00:04:07Z)
Adversarially-Robust TD Learning with Markovian Data: Finite-Time Rates and Fundamental Limits [2.07180164747172]
Motivated by harsh, real-world environments, we revisit the policy evaluation problem from the perspective of adversarial robustness. We develop a novel algorithm called Robust-TD and prove that its finite-time guarantees match that of vanilla TD with linear function approximation up to a small $O(epsilon)$ term. To our knowledge, these results are the first of their kind in the context of adversarial approximation schemes driven by Markov noise.
arXiv Detail & Related papers (2025-02-07T05:05:42Z)
Efficient and Interpretable Bandit Algorithms [18.99853072645046]
A bandit algorithm is interpretable if it explores with the objective of reducing uncertainty in the unknown model parameter. We propose CODE, a bandit algorithm based on a Constrained Optimal DEsign, that is interpretable and maximally reduces the uncertainty.
arXiv Detail & Related papers (2023-10-23T09:36:13Z)
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes [59.61248760134937]
We propose an efficient algorithm to achieve a regret of $tildeO(sqrtT+zeta)$. The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit. We generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $zeta$.
arXiv Detail & Related papers (2022-12-12T15:04:56Z)
ReLU Regression with Massart Noise [52.10842036932169]
We study the fundamental problem of ReLU regression, where the goal is to fit Rectified Linear Units (ReLUs) to data. We focus on ReLU regression in the Massart noise model, a natural and well-studied semi-random noise model. We develop an efficient algorithm that achieves exact parameter recovery in this model.
arXiv Detail & Related papers (2021-09-10T02:13:22Z)
Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression [91.3373131262391]
Uncertainty is the only certainty there is. Traditionally, the direct regression formulation is considered and the uncertainty is modeled by modifying the output space to a certain family of probabilistic distributions. How to model the uncertainty within the present-day technologies for regression remains an open issue.
arXiv Detail & Related papers (2021-03-25T06:56:09Z)
Adapting to misspecification in contextual bandits with offline regression oracles [7.312170216336086]
We propose a family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy. Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level.
arXiv Detail & Related papers (2021-02-26T00:15:04Z)
Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification [58.03725169462616]
We show theoretically that over-parametrization is not the only reason for over-confidence. We prove that logistic regression is inherently over-confident, in the realizable, under-parametrized setting. Perhaps surprisingly, we also show that over-confidence is not always the case.
arXiv Detail & Related papers (2021-02-15T21:38:09Z)
Bandit algorithms: Letting go of logarithmic regret for statistical robustness [0.0]
We study regret in a multi-armed bandit setting and establish a fundamental trade-off between the regret suffered under an algorithm, and its statistical robustness. We show that bandit learning algorithms with logarithmic regret are always inconsistent and that consistent learning algorithms always suffer a super-logarithmic regret.
arXiv Detail & Related papers (2020-06-22T07:18:47Z)
Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z)
Provable Training of a ReLU Gate with an Iterative Non-Gradient Algorithm [0.7614628596146599]
We show provable guarantees on the training of a single ReLU gate in hitherto unexplored regimes. We show a first-of-its-kind approximate recovery of the true label generating parameters under an (online) data-poisoning attack on the true labels. Our guarantee is shown to be nearly optimal in the worst case and its accuracy of recovering the true weight degrades gracefully with increasing probability of attack and its magnitude.
arXiv Detail & Related papers (2020-05-08T17:59:23Z)
Being Bayesian about Categorical Probability [6.875312133832079]
We consider a random variable of a categorical probability over class labels. In this framework, the prior distribution explicitly models the presumed noise inherent in the observed label. Our method can be implemented as a plug-and-play loss function with negligible computational overhead.
arXiv Detail & Related papers (2020-02-19T02:35:32Z)
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles [112.89548995091182]
We provide the first universal and optimal reduction from contextual bandits to online regression. Our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
arXiv Detail & Related papers (2020-02-12T11:33:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.