Related papers: An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

URL: http://arxiv.org/abs/2010.12247v2
Date: Fri, 20 Nov 2020 09:19:44 GMT
Title: An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits
Authors: Andrea Tirinzoni, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric
Abstract summary: We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
Score: 129.1029690825929
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the contextual linear bandit setting, algorithms built on the optimism principle fail to exploit the structure of the problem and have been shown to be asymptotically suboptimal. In this paper, we follow recent approaches of deriving asymptotically optimal algorithms from problem-dependent regret lower bounds and we introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We build on a reformulation of the lower bound, where context distribution and exploration policy are decoupled, and we obtain an algorithm robust to unbalanced context distributions. Then, using an incremental primal-dual approach to solve the Lagrangian relaxation of the lower bound, we obtain a scalable and computationally efficient algorithm. Finally, we remove forced exploration and build on confidence intervals of the optimization problem to encourage a minimum level of exploration that is better adapted to the problem structure. We demonstrate the asymptotic optimality of our algorithm, while providing both problem-dependent and worst-case finite-time regret guarantees. Our bounds scale with the logarithm of the number of arms, thus avoiding the linear dependence common in all related prior works. Notably, we establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits. Finally, we verify that our algorithm obtains better empirical performance than state-of-the-art baselines.

Related papers

Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Accelerated SGD for Non-Strongly-Convex Least Squares [14.010916616909743]
We consider approximation for the least squares regression problem in the non-strongly convex setting. We present the first practical algorithm that achieves the optimal prediction error rates in terms of dependence on the noise of the problem.
arXiv Detail & Related papers (2022-03-03T14:39:33Z)
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality. We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z)
A proximal-proximal majorization-minimization algorithm for nonconvex tuning-free robust regression problems [4.261680642170457]
We introduce proximal-proximal majorization-minimization (PPMM) algorithm for non regression problems. Our proposed algorithm outperforms the existing state-of-the-art algorithms.
arXiv Detail & Related papers (2021-06-25T15:07:13Z)
Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network. We design two optimal algorithms that attain these lower bounds. We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z)
Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z)
Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits. We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z)
Gamification of Pure Exploration for Linear Bandits [34.16123941778227]
We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear bandits. Whileally optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive. We design the first insightally optimal algorithm for fixed-confidence pure exploration in linear bandits.
arXiv Detail & Related papers (2020-07-02T08:20:35Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function. We consider a randomized approximation of the projected gradient descent algorithm. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.