Related papers: Batch Ensemble for Variance Dependent Regret in Stochastic Bandits

Batch Ensemble for Variance Dependent Regret in Stochastic Bandits

URL: http://arxiv.org/abs/2409.08570v1
Date: Fri, 13 Sep 2024 06:40:56 GMT
Title: Batch Ensemble for Variance Dependent Regret in Stochastic Bandits
Authors: Asaf Cassel, Orin Levy, Yishay Mansour,
Abstract summary: Efficiently trading off exploration and exploitation is one of the key challenges in online Reinforcement Learning (RL) Inspired by practical ensemble methods, in this work we propose a simple and novel batch ensemble scheme that achieves near-optimal regret for Multi-Armed Bandits (MAB) Our algorithm has just a single parameter namely the number of batches, and its value does not depend on distributional properties such as the scale and variance of the losses.
Score: 41.95653110232677
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Efficiently trading off exploration and exploitation is one of the key challenges in online Reinforcement Learning (RL). Most works achieve this by carefully estimating the model uncertainty and following the so-called optimistic model. Inspired by practical ensemble methods, in this work we propose a simple and novel batch ensemble scheme that provably achieves near-optimal regret for stochastic Multi-Armed Bandits (MAB). Crucially, our algorithm has just a single parameter, namely the number of batches, and its value does not depend on distributional properties such as the scale and variance of the losses. We complement our theoretical results by demonstrating the effectiveness of our algorithm on synthetic benchmarks.

Related papers

Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update [60.414548453838506]
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function.<n>GLBs are widely applicable to real-world scenarios, but their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency.<n>We propose a jointly efficient algorithm that attains a nearly optimal regret bound with $mathcalO(1)$ time and space complexities per round.
arXiv Detail & Related papers (2025-07-16T02:24:21Z)
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z)
TS-RSR: A provably efficient approach for batch bayesian optimization [4.622871908358325]
This paper presents a new approach for batch Bayesian Optimization (BO) called Thompson Sampling-Regret to Sigma Ratio directed sampling. Our sampling objective is able to coordinate the actions chosen in each batch in a way that minimizes redundancy between points. We demonstrate that our method attains state-of-the-art performance on a range of challenging synthetic and realistic test functions.
arXiv Detail & Related papers (2024-03-07T18:58:26Z)
Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques. In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets. Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z)
Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks. We study the problem from a model-based Bayesian reinforcement learning perspective. We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z)
Convergence of uncertainty estimates in Ensemble and Bayesian sparse model discovery [4.446017969073817]
We show empirical success in terms of accuracy and robustness to noise with bootstrapping-based sequential thresholding least-squares estimator. We show that this bootstrapping-based ensembling technique can perform a provably correct variable selection procedure with an exponential convergence rate of the error rate.
arXiv Detail & Related papers (2023-01-30T04:07:59Z)
Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z)
Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity [26.518803984578867]
Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging. One typically resorts to sampling-based approximations of the true marginal. We propose a new training strategy which replaces these estimators by an exact yet efficient marginalization.
arXiv Detail & Related papers (2020-07-03T19:36:35Z)
Structure Adaptive Algorithms for Stochastic Bandits [22.871155520200773]
We study reward maximisation in a class of structured multi-armed bandit problems. The mean rewards of arms satisfy some given structural constraints. We develop algorithms from instance-dependent lower-bounds using iterative saddle-point solvers.
arXiv Detail & Related papers (2020-07-02T08:59:54Z)
SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z)
Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models [19.07718284287928]
We show that the difficulty of obtaining reliable gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed. We propose to directly maximize the target log-likelihood and simultaneously minimize the inclusive divergence between the posterior and the inference model. The resulting learning algorithm is called joint SA (JSA)
arXiv Detail & Related papers (2020-05-28T13:50:08Z)
Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB. We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits. Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.