Geometry-Aware Approaches for Balancing Performance and Theoretical
Guarantees in Linear Bandits
- URL: http://arxiv.org/abs/2306.14872v3
- Date: Sat, 30 Dec 2023 20:16:23 GMT
- Title: Geometry-Aware Approaches for Balancing Performance and Theoretical
Guarantees in Linear Bandits
- Authors: Yuwei Luo, Mohsen Bayati
- Abstract summary: Thompson sampling and Greedy demonstrate promising empirical performance, yet this contrasts with their pessimistic theoretical regret bounds.
We propose a new data-driven technique that tracks the geometric properties of the uncertainty ellipsoid.
We identify and course-correct" problem instances in which the base algorithms perform poorly.
- Score: 6.907555940790131
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper is motivated by recent research in the $d$-dimensional stochastic
linear bandit literature, which has revealed an unsettling discrepancy:
algorithms like Thompson sampling and Greedy demonstrate promising empirical
performance, yet this contrasts with their pessimistic theoretical regret
bounds. The challenge arises from the fact that while these algorithms may
perform poorly in certain problem instances, they generally excel in typical
instances. To address this, we propose a new data-driven technique that tracks
the geometric properties of the uncertainty ellipsoid around the main problem
parameter. This methodology enables us to formulate an instance-dependent
frequentist regret bound, which incorporates the geometric information, for a
broad class of base algorithms, including Greedy, OFUL, and Thompson sampling.
This result allows us to identify and ``course-correct" problem instances in
which the base algorithms perform poorly. The course-corrected algorithms
achieve the minimax optimal regret of order $\tilde{\mathcal{O}}(d\sqrt{T})$
for a $T$-period decision-making scenario, effectively maintaining the
desirable attributes of the base algorithms, including their empirical
efficacy. We present simulation results to validate our findings using
synthetic and real data.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Integer Programming for Learning Directed Acyclic Graphs from Non-identifiable Gaussian Models [6.54203362045253]
We study the problem of learning directed acyclic graphs from continuous observational data.
We develop a mixed-integer programming framework for learning medium-sized problems.
Our method outperforms state-of-the-art algorithms and is robust to noise heteroscedasticity.
arXiv Detail & Related papers (2024-04-19T02:42:13Z) - Towards Practical Robustness Auditing for Linear Regression [8.9598796481325]
We investigate algorithms to find or disprove the existence of small subsets of a dataset.
We show that these methods largely outperform the state of the art and provide a useful robustness check for regression problems in a few dimensions.
We make some headway on this challenge via a spectral algorithm using ideas drawn from recent innovations in algorithmic robust statistics.
arXiv Detail & Related papers (2023-07-30T20:47:36Z) - High-dimensional Contextual Bandit Problem without Sparsity [8.782204980889077]
We propose an explore-then-commit (EtC) algorithm to address this problem and examine its performance.
We derive the optimal rate of the ETC algorithm in terms of $T$ and show that this rate can be achieved by balancing exploration and exploitation.
We introduce an adaptive explore-then-commit (AEtC) algorithm that adaptively finds the optimal balance.
arXiv Detail & Related papers (2023-06-19T15:29:32Z) - Instance-Dependent Confidence and Early Stopping for Reinforcement
Learning [99.57168572237421]
Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure.
This research provides guarantees that explain textitex post the performance differences observed.
A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice.
arXiv Detail & Related papers (2022-01-21T04:25:35Z) - Sparse PCA: Algorithms, Adversarial Perturbations and Certificates [9.348107805982604]
We study efficient algorithms for Sparse PCA in standard statistical models.
Our goal is to achieve optimal recovery guarantees while being resilient to small perturbations.
arXiv Detail & Related papers (2020-11-12T18:58:51Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning.
Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms.
We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z) - Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP)
We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.