An Efficient Pessimistic-Optimistic Algorithm for Constrained Linear
Bandits
- URL: http://arxiv.org/abs/2102.05295v1
- Date: Wed, 10 Feb 2021 07:30:37 GMT
- Title: An Efficient Pessimistic-Optimistic Algorithm for Constrained Linear
Bandits
- Authors: Xin Liu, Bin Li, Pengyi Shi, Lei Ying
- Abstract summary: This paper considers linear bandits with general constraints.
The objective is to maximize the expected cumulative reward over horizon $T$ subject to a set of constraints in each round.
We propose a pessimistic-optimistic algorithm for this problem, which is efficient in two aspects.
- Score: 16.103685478563072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers stochastic linear bandits with general constraints. The
objective is to maximize the expected cumulative reward over horizon $T$
subject to a set of constraints in each round $\tau\leq T$. We propose a
pessimistic-optimistic algorithm for this problem, which is efficient in two
aspects. First, the algorithm yields $\tilde{\cal
O}\left(\left(\frac{K^{1.5}}{\delta^2}+d\right)\sqrt{\tau}\right)$ (pseudo)
regret in round $\tau\leq T,$ where $K$ is the number of constraints, $d$ is
the dimension of the reward feature space, and $\delta$ is a Slater's constant;
and zero constraint violation in any round $\tau>\tau',$ where $\tau'$ is
independent of horizon $T.$ Second, the algorithm is computationally efficient.
Our algorithm is based on the primal-dual approach in optimization, and
includes two components. The primal component is similar to unconstrained
stochastic linear bandits (our algorithm uses the linear upper confidence bound
algorithm (LinUCB)). The computational complexity of the dual component depends
on the number of constraints, and is independent of sizes of the contextual
space, the action space, and even the feature space. So the overall
computational complexity of our algorithm is similar to the linear UCB for
unconstrained stochastic linear bandits.
Related papers
- Best-of-Both-Worlds Algorithms for Linear Contextual Bandits [11.94312915280916]
We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits.
Our algorithms deliver near-optimal regret bounds in both the adversarial and adversarial regimes.
arXiv Detail & Related papers (2023-12-24T08:27:30Z) - Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback [30.337826346496385]
We study online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback.
We introduce two algorithms that achieve improved regret performance compared to existing approaches.
arXiv Detail & Related papers (2023-10-17T19:43:37Z) - Projection-free Online Exp-concave Optimization [21.30065439295409]
We present an LOO-based ONS-style algorithm, which using overall $O(T)$ calls to a LOO, guarantees in worst case regret bounded by $widetildeO(n2/3T2/3)$.
Our algorithm is most interesting in an important and plausible low-dimensional data scenario.
arXiv Detail & Related papers (2023-02-09T18:58:05Z) - Private estimation algorithms for stochastic block models and mixture
models [63.07482515700984]
General tools for designing efficient private estimation algorithms.
First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z) - Computationally Efficient Horizon-Free Reinforcement Learning for Linear
Mixture MDPs [111.75736569611159]
We propose the first computationally efficient horizon-free algorithm for linear mixture MDPs.
Our algorithm adapts a weighted least square estimator for the unknown transitional dynamic.
This also improves upon the best-known algorithms in this setting when $sigma_k2$'s are known.
arXiv Detail & Related papers (2022-05-23T17:59:18Z) - Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Gradient-Variation Bound for Online Convex Optimization with Constraints [25.002868073267464]
We study online convex optimization with constraints consisting of multiple functional constraints and a relatively simple constraint set, such as a Euclidean ball.
First-order methods achieve an $mathcalO(sqrtT)$ regret and an $mathcalO(1)$ constraint violation, but do not take into account the structural information of the problem.
In this paper, we provide an emphinstance-dependent bound for online convex optimization with complex constraints obtained by a novel online primal-dual mirror-prox algorithm.
arXiv Detail & Related papers (2020-06-22T17:38:14Z) - Statistically Efficient, Polynomial Time Algorithms for Combinatorial
Semi Bandits [3.9919781077062497]
We consider semi-bandits over a set of arms $cal X subset 0,1d$ where rewards are uncorrelated across items.
For this problem, the algorithm ESCB yields the smallest known regret bound $R(T) = cal OBig( d (ln m)2 (ln T) over Delta_min Big)$, but it has computational complexity $cal O(|cal X|)$ which is typically exponential in $d$, and cannot
arXiv Detail & Related papers (2020-02-17T21:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.