Related papers: Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

URL: http://arxiv.org/abs/2107.02237v1
Date: Mon, 5 Jul 2021 19:20:34 GMT
Title: Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination
Authors: Dylan J. Foster and Akshay Krishnamurthy
Abstract summary: A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise. First-order guarantees are relatively well understood in statistical and online learning. We show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees.
Score: 82.52105963476703
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise, often quantified by the performance of the best hypothesis; such results are known as first-order or small-loss guarantees. While first-order guarantees are relatively well understood in statistical and online learning, adapting to low noise in contextual bandits (and more broadly, decision making) presents major algorithmic challenges. In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire asked whether first-order guarantees are even possible for contextual bandits and -- if so -- whether they can be attained by efficient algorithms. We give a resolution to this question by providing an optimal and efficient reduction from contextual bandits to online regression with the logarithmic (or, cross-entropy) loss. Our algorithm is simple and practical, readily accommodates rich function classes, and requires no distributional assumptions beyond realizability. In a large-scale empirical evaluation, we find that our approach typically outperforms comparable non-first-order methods. On the technical side, we show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees, and we combine this observation with new refinements to the regression oracle reduction framework of Foster and Rakhlin. The use of triangular discrimination yields novel results even for the classical statistical learning model, and we anticipate that it will find broader use.

Related papers

MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL) We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately. Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z)
Interpolation-based Contrastive Learning for Few-Label Semi-Supervised Learning [43.51182049644767]
Semi-supervised learning (SSL) has long been proved to be an effective technique to construct powerful models with limited labels. Regularization-based methods which force the perturbed samples to have similar predictions with the original ones have attracted much attention. We propose a novel contrastive loss to guide the embedding of the learned network to change linearly between samples.
arXiv Detail & Related papers (2022-02-24T06:00:05Z)
Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting. We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z)
Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class. For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z)
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs [48.44657553192801]
We develop a new approach to obtaining high probability regret bounds for online learning with bandit feedback against an adaptive adversary. Our approach relies on a simple increasing learning rate schedule, together with the help of logarithmically homogeneous self-concordant barriers and a strengthened Freedman's inequality.
arXiv Detail & Related papers (2020-06-14T22:09:27Z)
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles [112.89548995091182]
We provide the first universal and optimal reduction from contextual bandits to online regression. Our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
arXiv Detail & Related papers (2020-02-12T11:33:46Z)
Noise-tolerant, Reliable Active Classification with Comparison Queries [25.725730509014355]
We study the paradigm of active learning, in which algorithms with access to large pools of data may adaptively choose what samples to label. We provide the first time and query efficient algorithms for learning non-homogeneous linear separators robust to bounded (Massart) noise.
arXiv Detail & Related papers (2020-01-15T19:00:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.