Parameter and Feature Selection in Stochastic Linear Bandits
- URL: http://arxiv.org/abs/2106.05378v1
- Date: Wed, 9 Jun 2021 20:37:29 GMT
- Title: Parameter and Feature Selection in Stochastic Linear Bandits
- Authors: Ahmadreza Moradipari, Yasin Abbasi-Yadkori, Mahnoosh Alizadeh,
Mohammad Ghavamzadeh
- Abstract summary: We study two model selection settings in linear bandits (LB)
In the first setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as overlapping balls in $mathbb Rd$.
In the second setting, the expected reward of the LB problem is in the linear span of at least one of $M$ feature maps (models)
- Score: 38.909757749493934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study two model selection settings in stochastic linear bandits (LB). In
the first setting, the reward parameter of the LB problem is arbitrarily
selected from $M$ models represented as (possibly) overlapping balls in
$\mathbb R^d$. However, the agent only has access to misspecified models, i.e.,
estimates of the centers and radii of the balls. We refer to this setting as
parameter selection. In the second setting, which we refer to as feature
selection, the expected reward of the LB problem is in the linear span of at
least one of $M$ feature maps (models). For each setting, we develop and
analyze an algorithm that is based on a reduction from bandits to
full-information problems. This allows us to obtain regret bounds that are not
worse (up to a $\sqrt{\log M}$ factor) than the case where the true model is
known. Our parameter selection algorithm is OFUL-style and the one for feature
selection is based on the SquareCB algorithm. We also show that the regret of
our parameter selection algorithm scales logarithmically with model
misspecification.
Related papers
- Parameter-free Algorithms for the Stochastically Extended Adversarial Model [59.81852138768642]
Existing approaches for the Extended Adversarial (SEA) model require prior knowledge of problem-specific parameters, such as the diameter of the domain $D$ and the Lipschitz constant of the loss functions $G$.<n>We develop parameter-free methods by leveraging the Optimistic Online Newton Step (OONS) algorithm to eliminate the need for these parameters.
arXiv Detail & Related papers (2025-10-06T10:53:37Z) - An Iterative Bayesian Approach for System Identification based on Linear Gaussian Models [86.05414211113627]
We tackle the problem of system identification, where we select inputs, observe the corresponding outputs from the true system, and optimize the parameters of our model to best fit the data.
We propose a flexible and computationally tractable methodology that is compatible with any system and parametric family of models.
arXiv Detail & Related papers (2025-01-28T01:57:51Z) - Rising Rested Bandits: Lower Bounds and Efficient Algorithms [15.390680055166769]
This paper is in the field of sequential Multi-Armed Bandits (MABs)
We study a particular case of the rested bandits in which the arms' expected reward is monotonically non-decreasing and concave.
We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset.
arXiv Detail & Related papers (2024-11-06T22:00:46Z) - Sparse Linear Bandits with Blocking Constraints [22.01704171400845]
We investigate the high-dimensional sparse linear bandits problem in a data-poor regime.<n>We show novel offline statistical guarantees of the lasso estimator for the linear model.<n>We propose a meta-algorithm based on corralling that does not need knowledge of optimal sparsity parameter $k$ at minimal cost to regret.
arXiv Detail & Related papers (2024-10-26T01:42:03Z) - Online Clustering of Bandits with Misspecified User Models [42.56440072468658]
Contextual linear bandit is an online learning problem where given arm features, a learning agent selects an arm at each round to maximize the cumulative rewards in the long run.
A line of works, called the clustering of bandits (CB), utilize the collaborative effect over user preferences and have shown significant improvements over classic linear bandit algorithms.
In this paper, we are the first to present the important problem of clustering of bandits with misspecified user models (CBMUM).
We devise two robust CB algorithms, RCLUMB and RSCLUMB, that can accommodate the inaccurate user preference estimations and erroneous clustering caused by model misspecifications.
arXiv Detail & Related papers (2023-10-04T10:40:50Z) - Anytime Model Selection in Linear Bandits [61.97047189786905]
We develop ALEXP, which has an exponentially improved dependence on $M$ for its regret.
Our approach utilizes a novel time-uniform analysis of the Lasso, establishing a new connection between online learning and high-dimensional statistics.
arXiv Detail & Related papers (2023-07-24T15:44:30Z) - Best Arm Identification for Stochastic Rising Bandits [84.55453174601826]
Rising Bandits (SRBs) model sequential decision-making problems in which the expected reward of the available options increases every time they are selected.
This paper focuses on the fixed-budget Best Arm Identification (BAI) problem for SRBs.
We propose two algorithms to tackle the above-mentioned setting, namely R-UCBE and R-SR.
arXiv Detail & Related papers (2023-02-15T08:01:37Z) - Online Recommendations for Agents with Discounted Adaptive Preferences [17.501559059079806]
bandit recommendations problem in which an agent's preferences evolve as a function of past selections.
We show an algorithm which obtains efficient sublinear regret with respect to nearly the $textitentire$ item simplex.
arXiv Detail & Related papers (2023-02-12T22:04:27Z) - Model Selection in Reinforcement Learning with General Function
Approximations [10.97775622611135]
We consider model selection for Reinforcement Learning environments -- Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs)
In the model selection framework, we do not know the function classes, denoted by $mathcalF$ and $mathcalM$, where the true models lie.
We show that the cumulative regret of our adaptive algorithms match to that of an oracle which knows the correct function classes.
arXiv Detail & Related papers (2022-07-06T21:52:07Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Universal and data-adaptive algorithms for model selection in linear
contextual bandits [52.47796554359261]
We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem.
We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$.
Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
arXiv Detail & Related papers (2021-11-08T18:05:35Z) - Learning to Rank under Multinomial Logit Choice [6.929312022493406]
Learning the optimal ordering of content is an important challenge in website design.
We present theoretical analysis leading to an $Omega(sqrtJT)$ lower bound for the problem, and an $tildeO(sqrtJT)$ upper bound on regret of the UCB algorithm.
arXiv Detail & Related papers (2020-09-07T16:15:12Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.