Related papers: Optimal Regret Bounds for Collaborative Learning in Bandits

Optimal Regret Bounds for Collaborative Learning in Bandits

URL: http://arxiv.org/abs/2312.09674v1
Date: Fri, 15 Dec 2023 10:36:13 GMT
Title: Optimal Regret Bounds for Collaborative Learning in Bandits
Authors: Amitis Shidani and Sattar Vakili
Abstract summary: We consider regret in a general collaborative multi-agent multi-armed bandit model. We propose the first algorithm with order optimal regret bounds under this model.
Score: 10.76667043339504
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider regret minimization in a general collaborative multi-agent multi-armed bandit model, in which each agent faces a finite set of arms and may communicate with other agents through a central controller. The optimal arm for each agent in this model is the arm with the largest expected mixed reward, where the mixed reward of each arm is a weighted average of its rewards across all agents, making communication among agents crucial. While near-optimal sample complexities for best arm identification are known under this collaborative model, the question of optimal regret remains open. In this work, we address this problem and propose the first algorithm with order optimal regret bounds under this collaborative bandit model. Furthermore, we show that only a small constant number of expected communication rounds is needed.

Related papers

An Algorithm for Fixed Budget Best Arm Identification with Combinatorial Exploration [3.9901365062418312]
We consider the best arm identification problem in the $K-$armed bandit framework. Agent is allowed to play a subset of arms at each time slot instead of one arm. We propose an algorithm that constructs $log K$ groups and performs a likelihood ratio test to detect the presence of the best arm.
arXiv Detail & Related papers (2025-02-03T15:10:08Z)
Stochastic Bandits for Egalitarian Assignment [58.33714486693828]
We study EgalMAB, an egalitarian assignment problem in the context of multi-armed bandits. We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret.
arXiv Detail & Related papers (2024-10-08T09:49:47Z)
Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI. Agent's goal is to identify the best arm with a prescribed confidence level. Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z)
Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players. The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile. We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z)
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents [52.75161794035767]
We introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously. We show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.
arXiv Detail & Related papers (2023-12-13T06:54:49Z)
Byzantine-Resilient Decentralized Multi-Armed Bandits [25.499420566469098]
We develop an algorithm that fuses an information mixing step among agents with a truncation of inconsistent and extreme values. This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets.
arXiv Detail & Related papers (2023-10-11T09:09:50Z)
Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward. We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy. We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z)
Near-Optimal Collaborative Learning in Bandits [15.456561090871244]
This paper introduces a general multi-agent bandit model in which each agent is facing a finite set of arms. The twist is that the optimal arm for each agent is the arm with largest expected mixed reward, where the mixed reward of an arm is a weighted sum of the rewards of this arm for all agents. We propose a near-optimal algorithm for pure exploration.
arXiv Detail & Related papers (2022-05-31T21:11:47Z)
Max-Min Grouped Bandits [48.62520520818357]
We introduce a multi-armed bandit problem termed max-min grouped bandits. The goal is to find a group whose worst arm has the highest mean reward. This problem is of interest to applications such as recommendation systems.
arXiv Detail & Related papers (2021-11-17T01:59:15Z)
Robust Multi-Agent Multi-Armed Bandits [26.26185074977412]
Recent works have shown that agents facing independent instances of a $K$-armed bandit can collaborate to decrease regret. We show that collaboration indeed decreases regret for this algorithm, assuming $m$ is small compared to $K$ but without assumptions on malicious agents' behavior.
arXiv Detail & Related papers (2020-07-07T22:27:30Z)
Tight Lower Bounds for Combinatorial Multi-Armed Bandits [72.56064196252498]
The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round. We show that the recently proposed Gini-weighted smoothness parameter determines the lower bounds for monotone reward functions.
arXiv Detail & Related papers (2020-02-13T08:53:43Z)
The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits [20.259428328004738]
We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents. In our model, agents collaborate by exchanging messages through pairwise gossip style communications on an arbitrary connected graph.
arXiv Detail & Related papers (2020-01-15T17:49:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.