Related papers: PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits

PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits

URL: http://arxiv.org/abs/2210.15345v3
Date: Fri, 17 Nov 2023 21:45:53 GMT
Title: PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits
Authors: Kyoungseok Jang, Chicheng Zhang, Kwang-Sung Jun
Abstract summary: We propose a simple and computationally efficient sparse linear estimation method called PopArt. We derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art.
Score: 29.097522376094624
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In sparse linear bandits, a learning agent sequentially selects an action and receive reward feedback, and the reward function depends linearly on a few coordinates of the covariates of the actions. This has applications in many real-world sequential decision making problems. In this paper, we propose a simple and computationally efficient sparse linear estimation method called PopArt that enjoys a tighter $\ell_1$ recovery guarantee compared to Lasso (Tibshirani, 1996) in many problems. Our bound naturally motivates an experimental design criterion that is convex and thus computationally efficient to solve. Based on our novel estimator and design criterion, we derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art (Hao et al., 2020), especially w.r.t. the geometry of the given action set. Finally, we prove a matching lower bound for sparse linear bandits in the data-poor regime, which closes the gap between upper and lower bounds in prior work.

Related papers

Sparse Additive Contextual Bandits: A Nonparametric Approach for Online Decision-making with High-dimensional Covariates [10.04289564826712]
We develop a contextual bandit algorithm based on sparse additive reward models in reproducing kernel Hilbert spaces. We establish statistical properties of the doubly penalized method applied to random regions, introducing novel analyses under bandit feedback.
arXiv Detail & Related papers (2025-03-21T08:33:28Z)
Federated Linear Dueling Bandits [28.74660004468591]
Contextual linear dueling bandits have recently garnered significant attention due to their widespread applications in important domains such as recommender systems and large language models. Previous works have developed federated linear bandit algorithms which rely on closed-form updates of the bandit parameters to achieve collaboration. In this work, we overcome this challenge through an innovative and principled combination of online gradient descent (for minimizing the loss function to estimate the linear function parameters) and federated learning, hence introducing the first federated linear dueling bandit algorithms.
arXiv Detail & Related papers (2025-02-03T05:56:22Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Efficient Low-Rank Matrix Estimation, Experimental Design, and Arm-Set-Dependent Low-Rank Bandits [25.88978373435619]
We study low-rank matrix trace regression and the related problem of low-rank matrix bandits. We propose a novel low-rank matrix estimation method called LowPopArt. We show that our method can provide tighter recovery guarantees than classical nuclear norm penalized least squares.
arXiv Detail & Related papers (2024-02-17T00:51:29Z)
Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference [23.364534479492715]
We show that the minimum eigenvalue of the expected matrix grows as $Omega(sqrtn) whenever the expected cumulative regret of the algorithm is $sqrtn)$. We apply our result to two practical scenarios -- emphmodel selection and emphclustering in linear bandits.
arXiv Detail & Related papers (2022-07-23T20:25:07Z)
Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits. We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z)
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality. We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z)
Hybrid Trilinear and Bilinear Programming for Aligning Partially Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm. We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound. We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Online and Distribution-Free Robustness: Regression and Contextual Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits. We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z)
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
Structure Adaptive Algorithms for Stochastic Bandits [22.871155520200773]
We study reward maximisation in a class of structured multi-armed bandit problems. The mean rewards of arms satisfy some given structural constraints. We develop algorithms from instance-dependent lower-bounds using iterative saddle-point solvers.
arXiv Detail & Related papers (2020-07-02T08:59:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.