Related papers: Max-Min Grouped Bandits

Max-Min Grouped Bandits

URL: http://arxiv.org/abs/2111.08862v1
Date: Wed, 17 Nov 2021 01:59:15 GMT
Title: Max-Min Grouped Bandits
Authors: Zhenlin Wang and Jonathan Scarlett
Abstract summary: We introduce a multi-armed bandit problem termed max-min grouped bandits. The goal is to find a group whose worst arm has the highest mean reward. This problem is of interest to applications such as recommendation systems.
Score: 48.62520520818357
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce a multi-armed bandit problem termed max-min grouped bandits, in which the arms are arranged in possibly-overlapping groups, and the goal is to find a group whose worst arm has the highest mean reward. This problem is of interest in applications such as recommendation systems, and is also closely related to widely-studied robust optimization problems. We present two algorithms based successive elimination and robust optimization, and derive upper bounds on the number of samples to guarantee finding a max-min optimal or near-optimal group, as well as an algorithm-independent lower bound. We discuss the degree of tightness of our bounds in various cases of interest, and the difficulties in deriving uniformly tight bounds.

Related papers

Collaborative Min-Max Regret in Grouped Multi-Armed Bandits [6.675805308519987]
We study the impact of sharing exploration in multi-armed bandits in a grouped setting.<n>We introduce an algorithm that dynamically coordinates exploration across groups.
arXiv Detail & Related papers (2025-06-12T02:56:35Z)
Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leiblier Maillard Sampling [24.487235945761913]
We study the problem of Multi-Armed Bandits (MAB) with reward distributions belonging to a One- Exponential Distribution (OPED) family. In this paper, we design an algorithm, Exponential Kullback-Leibler Maillard Sampling (abbrev. expklms), that can achieve multiple optimality criteria simultaneously.
arXiv Detail & Related papers (2025-02-20T09:12:16Z)
Optimization of Inter-group Criteria for Clustering with Minimum Size Constraints [2.7195102129095003]
Internal measures that are used to assess the quality of a clustering usually take into account intra-group and/or inter-group criteria. We propose algorithms with provable approximation guarantees for optimizing the former. We present an empirical study with 10 real datasets, providing evidence that our methods work very well in practical settings.
arXiv Detail & Related papers (2024-01-13T14:59:12Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB) Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z)
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences [28.79598714109439]
We study the problem of $K$-armed dueling bandit for both and adversarial environments. We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits. Our algorithm is also the first to achieve an optimal $O(sum_i = 1K fraclog TDelta_i)$ regret bound against the Condorcet-winner benchmark.
arXiv Detail & Related papers (2022-02-14T13:37:23Z)
Contextual Combinatorial Multi-output GP Bandits with Group Constraints [11.317136648551537]
In federated multi-armed bandit problems, maximizing global reward while satisfying minimum privacy requirements to protect clients is the main goal. We consider a contextual bandit setting with groups and changing action sets, where similar base arms arrive in groups and a set of base arms, called a super arm, must be chosen in each round to maximize super arm reward while satisfying the constraints of the rewards of groups from which base arms were chosen. We then propose a novel double-UCB GP-bandit algorithm, called Thresholded Combinatored Upper Confidence Bounds (TCGP-UCB), which balances between maximizing cumulative super arm reward and satisfying
arXiv Detail & Related papers (2021-11-29T18:39:09Z)
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z)
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality. We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z)
Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme [16.1694012177079]
We study the best-arm identification problem in multi-armed bandits with, potentially private rewards. The goal is to identify the arm with the highest quantile at a fixed, prescribed level. We show that our algorithm is $delta$-PAC and we characterize its sample complexity.
arXiv Detail & Related papers (2020-06-11T20:23:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.