PopArt: Efficient Sparse Regression and Experimental Design for Optimal
Sparse Linear Bandits
- URL: http://arxiv.org/abs/2210.15345v3
- Date: Fri, 17 Nov 2023 21:45:53 GMT
- Title: PopArt: Efficient Sparse Regression and Experimental Design for Optimal
Sparse Linear Bandits
- Authors: Kyoungseok Jang, Chicheng Zhang, Kwang-Sung Jun
- Abstract summary: We propose a simple and computationally efficient sparse linear estimation method called PopArt.
We derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art.
- Score: 29.097522376094624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In sparse linear bandits, a learning agent sequentially selects an action and
receive reward feedback, and the reward function depends linearly on a few
coordinates of the covariates of the actions. This has applications in many
real-world sequential decision making problems. In this paper, we propose a
simple and computationally efficient sparse linear estimation method called
PopArt that enjoys a tighter $\ell_1$ recovery guarantee compared to Lasso
(Tibshirani, 1996) in many problems. Our bound naturally motivates an
experimental design criterion that is convex and thus computationally efficient
to solve. Based on our novel estimator and design criterion, we derive sparse
linear bandit algorithms that enjoy improved regret upper bounds upon the state
of the art (Hao et al., 2020), especially w.r.t. the geometry of the given
action set. Finally, we prove a matching lower bound for sparse linear bandits
in the data-poor regime, which closes the gap between upper and lower bounds in
prior work.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Efficient Low-Rank Matrix Estimation, Experimental Design, and Arm-Set-Dependent Low-Rank Bandits [25.88978373435619]
We study low-rank matrix trace regression and the related problem of low-rank matrix bandits.
We propose a novel low-rank matrix estimation method called LowPopArt.
We show that our method can provide tighter recovery guarantees than classical nuclear norm penalized least squares.
arXiv Detail & Related papers (2024-02-17T00:51:29Z) - Exploration in Linear Bandits with Rich Action Sets and its Implications
for Inference [23.364534479492715]
We show that the minimum eigenvalue of the expected matrix grows as $Omega(sqrtn) whenever the expected cumulative regret of the algorithm is $sqrtn)$.
We apply our result to two practical scenarios -- emphmodel selection and emphclustering in linear bandits.
arXiv Detail & Related papers (2022-07-23T20:25:07Z) - Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits.
We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z) - Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave.
Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality.
We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z) - Hybrid Trilinear and Bilinear Programming for Aligning Partially
Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm.
We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound.
We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Online and Distribution-Free Robustness: Regression and Contextual
Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits.
We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Structure Adaptive Algorithms for Stochastic Bandits [22.871155520200773]
We study reward maximisation in a class of structured multi-armed bandit problems.
The mean rewards of arms satisfy some given structural constraints.
We develop algorithms from instance-dependent lower-bounds using iterative saddle-point solvers.
arXiv Detail & Related papers (2020-07-02T08:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.