Threshold-Based Optimal Arm Selection in Monotonic Bandits: Regret Lower Bounds and Algorithms
- URL: http://arxiv.org/abs/2509.02119v1
- Date: Tue, 02 Sep 2025 09:16:00 GMT
- Title: Threshold-Based Optimal Arm Selection in Monotonic Bandits: Regret Lower Bounds and Algorithms
- Authors: Chanakya Varude, Jay Chaudhary, Siddharth Kaushik, Prasanna Chaporkar,
- Abstract summary: In multi-armed bandit problems, the typical goal is to identify the arm with the highest reward.<n>This paper explores a threshold-based bandit problem, aiming to select an arm based on its relation to a prescribed threshold (tau)
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: In multi-armed bandit problems, the typical goal is to identify the arm with the highest reward. This paper explores a threshold-based bandit problem, aiming to select an arm based on its relation to a prescribed threshold \(\tau \). We study variants where the optimal arm is the first above \(\tau\), the \(k^{th}\) arm above or below it, or the closest to it, under a monotonic structure of arm means. We derive asymptotic regret lower bounds, showing dependence only on arms adjacent to \(\tau\). Motivated by applications in communication networks (CQI allocation), clinical dosing, energy management, recommendation systems, and more. We propose algorithms with optimality validated through Monte Carlo simulations. Our work extends classical bandit theory with threshold constraints for efficient decision-making.
Related papers
- Fixed-Budget Constrained Best Arm Identification in Grouped Bandits [1.360738859820932]
We study fixed budget constrained best-arm identification in grouped bandits, where each arm consists of multiple independent attributes with rewards.<n>We propose Feasibility Constrained Successive Rejects (FCSR), a novel algorithm that identifies the best arm while ensuring feasibility.
arXiv Detail & Related papers (2026-03-04T12:49:56Z) - Influential Bandits: Pulling an Arm May Change the Environment [44.71145269686588]
Real-world applications often involve non-stationary environments and interdependencies between arms.<n>We propose the influential bandit problem, which models inter-arm interactions through an unknown, symmetric, positive semi-definite interaction matrix.<n>We introduce a new algorithm based on a lower confidence bound (LCB) estimator tailored to the structure of the loss dynamics.
arXiv Detail & Related papers (2025-04-11T02:05:51Z) - Optimal Multi-Objective Best Arm Identification with Fixed Confidence [62.36929749450298]
We consider a multi-armed bandit setting in which each arm yields an $M$-dimensional vector reward upon selection.<n>The end goal is to identify the best arm of em every objective in the shortest (expected) time subject to an upper bound on the probability of error.<n>We propose an algorithm that uses the novel idea of em surrogate proportions to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step.
arXiv Detail & Related papers (2025-01-23T12:28:09Z) - Neural Dueling Bandits: Preference-Based Optimization with Human Feedback [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.<n>We also extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Finding Optimal Arms in Non-stochastic Combinatorial Bandits with
Semi-bandit Feedback and Finite Budget [6.759124697337311]
We consider the bandits problem with semi-bandit feedback under finite sampling budget constraints.
The action is to choose a set of arms, whereupon feedback for each arm in the chosen set is received.
We suggest a generic algorithm suitable to cover the full spectrum of conceivable arm elimination strategies.
arXiv Detail & Related papers (2022-02-09T14:36:05Z) - Best Arm Identification under Additive Transfer Bandits [49.69203462561861]
We study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances.
We propose and theoretically analyze an LUCB-style algorithm to identify an $epsilon$-optimal target arm with high probability.
arXiv Detail & Related papers (2021-12-08T02:20:18Z) - Max-Min Grouped Bandits [48.62520520818357]
We introduce a multi-armed bandit problem termed max-min grouped bandits.
The goal is to find a group whose worst arm has the highest mean reward.
This problem is of interest to applications such as recommendation systems.
arXiv Detail & Related papers (2021-11-17T01:59:15Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.