A Batch Sequential Halving Algorithm without Performance Degradation
- URL: http://arxiv.org/abs/2406.00424v1
- Date: Sat, 1 Jun 2024 12:41:50 GMT
- Title: A Batch Sequential Halving Algorithm without Performance Degradation
- Authors: Sotetsu Koyamada, Soichiro Nishimori, Shin Ishii,
- Abstract summary: We show that a simple Sequential batch algorithm does not degrade the performance under practical conditions.
We empirically validate our claim through experiments, demonstrating the robust nature of the algorithm in fixed-size batch settings.
- Score: 0.8283940114367677
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we investigate the problem of pure exploration in the context of multi-armed bandits, with a specific focus on scenarios where arms are pulled in fixed-size batches. Batching has been shown to enhance computational efficiency, but it can potentially lead to a degradation compared to the original sequential algorithm's performance due to delayed feedback and reduced adaptability. We introduce a simple batch version of the Sequential Halving (SH) algorithm (Karnin et al., 2013) and provide theoretical evidence that batching does not degrade the performance of the original algorithm under practical conditions. Furthermore, we empirically validate our claim through experiments, demonstrating the robust nature of the SH algorithm in fixed-size batch settings.
Related papers
- Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Batch Ensemble for Variance Dependent Regret in Stochastic Bandits [41.95653110232677]
Efficiently trading off exploration and exploitation is one of the key challenges in online Reinforcement Learning (RL)
Inspired by practical ensemble methods, in this work we propose a simple and novel batch ensemble scheme that achieves near-optimal regret for Multi-Armed Bandits (MAB)
Our algorithm has just a single parameter namely the number of batches, and its value does not depend on distributional properties such as the scale and variance of the losses.
arXiv Detail & Related papers (2024-09-13T06:40:56Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - Provably Efficient Learning in Partially Observable Contextual Bandit [4.910658441596583]
We show how causal bounds can be applied to improving classical bandit algorithms.
This research has the potential to enhance the performance of contextual bandit agents in real-world applications.
arXiv Detail & Related papers (2023-08-07T13:24:50Z) - Anti-Exploration by Random Network Distillation [63.04360288089277]
We show that a naive choice of conditioning for the Random Network Distillation (RND) is not discriminative enough to be used as an uncertainty estimator.
We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM)
We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin.
arXiv Detail & Related papers (2023-01-31T13:18:33Z) - Batch Bayesian optimisation via density-ratio estimation with guarantees [26.052368583196426]
We present a theoretical analysis of BORE's regret and an extension of the algorithm with improved uncertainty estimates.
We also show that BORE can be naturally extended to a batch optimisation setting by recasting the problem as approximate Bayesian inference.
arXiv Detail & Related papers (2022-09-22T00:42:18Z) - Regret Bounds for Expected Improvement Algorithms in Gaussian Process
Bandit Optimization [63.8557841188626]
The expected improvement (EI) algorithm is one of the most popular strategies for optimization under uncertainty.
We propose a variant of EI with a standard incumbent defined via the GP predictive mean.
We show that our algorithm converges, and achieves a cumulative regret bound of $mathcal O(gamma_TsqrtT)$.
arXiv Detail & Related papers (2022-03-15T13:17:53Z) - Extreme Bandits using Robust Statistics [12.6543086847761]
We consider a multi-armed bandit problem motivated by situations where only the extreme values, as opposed to expected values in the classical bandit setting, are of interest.
We propose distribution free algorithms using robust statistics and characterize the statistical properties.
arXiv Detail & Related papers (2021-09-09T17:24:15Z) - Improved Branch and Bound for Neural Network Verification via Lagrangian
Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks.
We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB)
BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z) - Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits.
We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.