Related papers: Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

URL: http://arxiv.org/abs/2506.14988v3
Date: Fri, 25 Jul 2025 06:06:47 GMT
Title: Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits
Authors: Tianyi Xu, Jiaxin Liu, Nicholas Mattei, Zizhan Zheng,
Abstract summary: We introduce a novel probing framework that strategically gathers information about selected arms before allocation.<n>In the offline setting, where reward distributions are known, we leverage submodular properties to design a greedy probing algorithm with a provable performance bound.<n>For the more complex online setting, we develop an algorithm that achieves sublinear regret while maintaining fairness.
Score: 15.700062892888084
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a multi-agent multi-armed bandit (MA-MAB) framework aimed at ensuring fair outcomes across agents while maximizing overall system performance. A key challenge in this setting is decision-making under limited information about arm rewards. To address this, we introduce a novel probing framework that strategically gathers information about selected arms before allocation. In the offline setting, where reward distributions are known, we leverage submodular properties to design a greedy probing algorithm with a provable performance bound. For the more complex online setting, we develop an algorithm that achieves sublinear regret while maintaining fairness. Extensive experiments on synthetic and real-world datasets show that our approach outperforms baseline methods, achieving better fairness and efficiency.

Related papers

Representative Action Selection for Large Action-Space Meta-Bandits [49.386906771833274]
We study the problem of selecting a subset from a large action space shared by a family of bandits.<n>We assume that similar actions tend to have related payoffs, modeled by a Gaussian process.<n>We propose a simple epsilon-net algorithm to select a representative subset.
arXiv Detail & Related papers (2025-05-23T18:08:57Z)
Offline Learning for Combinatorial Multi-armed Bandits [56.96242764723241]
Off-CMAB is the first offline learning framework for CMAB.<n>Off-CMAB combines pessimistic reward estimations with solvers.<n>Experiments on synthetic and real-world datasets highlight the superior performance of CLCB.
arXiv Detail & Related papers (2025-01-31T16:56:18Z)
Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Preference-Based Multi-Agent Reinforcement Learning (PbMARL)<n>We identify the Nash equilibrium from a preference-only offline dataset in general-sum games.<n>Our findings underscore the multifaceted approach required for PbMARL.
arXiv Detail & Related papers (2024-09-01T13:14:41Z)
Meta Clustering of Neural Bandits [45.77505279698894]
We study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function. We propose a novel algorithm called M-CNB, which utilizes a meta-learner to represent and rapidly adapt to dynamic clusters. In extensive experiments conducted in both recommendation and online classification scenarios, M-CNB outperforms SOTA baselines.
arXiv Detail & Related papers (2024-08-10T16:09:51Z)
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents [52.75161794035767]
We introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously.<n>We show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.
arXiv Detail & Related papers (2023-12-13T06:54:49Z)
Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward. We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy. We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z)
Achieving Fairness in Multi-Agent Markov Decision Processes Using Reinforcement Learning [30.605881670761853]
We propose a Reinforcement Learning approach to achieve fairness in finite-horizon episodic MDPs. We show that such an approach achieves sub-linear regret in terms of the number of episodes.
arXiv Detail & Related papers (2023-06-01T03:43:53Z)
Thompson Sampling with Virtual Helping Agents [0.0]
We address the problem of online sequential decision making, i.e., balancing the trade-off between exploiting current knowledge to maximize immediate performance and exploring the new information to gain long-term benefits. We propose two algorithms for the multi-armed bandit problem and provide theoretical bounds on the cumulative regret.
arXiv Detail & Related papers (2022-09-16T23:34:44Z)
Distributed Stochastic Bandit Learning with Context Distributions [0.0]
We study the problem of distributed multi-arm contextual bandit with unknown contexts. In our model, an adversary chooses a distribution on the set of possible contexts and the agents observe only the context distribution and the exact context is unknown to the agents. Our goal is to develop a distributed algorithm that selects a sequence of optimal actions to maximize the cumulative reward.
arXiv Detail & Related papers (2022-07-28T22:00:11Z)
Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server. $alpha$-fraction of agents are adversarial and can report arbitrary fake information. We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z)
Output-Weighted Sampling for Multi-Armed Bandits with Extreme Payoffs [11.1546439770774]
We present a new type of acquisition functions for online decision making in bandit problems with extreme payoffs. We formulate a novel type of upper confidence bound (UCB) acquisition function that guides exploration towards the bandits that are deemed most relevant.
arXiv Detail & Related papers (2021-02-19T18:36:03Z)
Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.