Efficient First-Order Contextual Bandits: Prediction, Allocation, and
Triangular Discrimination
- URL: http://arxiv.org/abs/2107.02237v1
- Date: Mon, 5 Jul 2021 19:20:34 GMT
- Title: Efficient First-Order Contextual Bandits: Prediction, Allocation, and
Triangular Discrimination
- Authors: Dylan J. Foster and Akshay Krishnamurthy
- Abstract summary: A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise.
First-order guarantees are relatively well understood in statistical and online learning.
We show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees.
- Score: 82.52105963476703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recurring theme in statistical learning, online learning, and beyond is
that faster convergence rates are possible for problems with low noise, often
quantified by the performance of the best hypothesis; such results are known as
first-order or small-loss guarantees. While first-order guarantees are
relatively well understood in statistical and online learning, adapting to low
noise in contextual bandits (and more broadly, decision making) presents major
algorithmic challenges. In a COLT 2017 open problem, Agarwal, Krishnamurthy,
Langford, Luo, and Schapire asked whether first-order guarantees are even
possible for contextual bandits and -- if so -- whether they can be attained by
efficient algorithms. We give a resolution to this question by providing an
optimal and efficient reduction from contextual bandits to online regression
with the logarithmic (or, cross-entropy) loss. Our algorithm is simple and
practical, readily accommodates rich function classes, and requires no
distributional assumptions beyond realizability. In a large-scale empirical
evaluation, we find that our approach typically outperforms comparable
non-first-order methods.
On the technical side, we show that the logarithmic loss and an
information-theoretic quantity called the triangular discrimination play a
fundamental role in obtaining first-order guarantees, and we combine this
observation with new refinements to the regression oracle reduction framework
of Foster and Rakhlin. The use of triangular discrimination yields novel
results even for the classical statistical learning model, and we anticipate
that it will find broader use.
Related papers
- MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL)
We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately.
Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z) - Interpolation-based Contrastive Learning for Few-Label Semi-Supervised
Learning [43.51182049644767]
Semi-supervised learning (SSL) has long been proved to be an effective technique to construct powerful models with limited labels.
Regularization-based methods which force the perturbed samples to have similar predictions with the original ones have attracted much attention.
We propose a novel contrastive loss to guide the embedding of the learned network to change linearly between samples.
arXiv Detail & Related papers (2022-02-24T06:00:05Z) - Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting.
We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Bias no more: high-probability data-dependent regret bounds for
adversarial bandits and MDPs [48.44657553192801]
We develop a new approach to obtaining high probability regret bounds for online learning with bandit feedback against an adaptive adversary.
Our approach relies on a simple increasing learning rate schedule, together with the help of logarithmically homogeneous self-concordant barriers and a strengthened Freedman's inequality.
arXiv Detail & Related papers (2020-06-14T22:09:27Z) - Beyond UCB: Optimal and Efficient Contextual Bandits with Regression
Oracles [112.89548995091182]
We provide the first universal and optimal reduction from contextual bandits to online regression.
Our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
arXiv Detail & Related papers (2020-02-12T11:33:46Z) - Noise-tolerant, Reliable Active Classification with Comparison Queries [25.725730509014355]
We study the paradigm of active learning, in which algorithms with access to large pools of data may adaptively choose what samples to label.
We provide the first time and query efficient algorithms for learning non-homogeneous linear separators robust to bounded (Massart) noise.
arXiv Detail & Related papers (2020-01-15T19:00:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.