Related papers: Honor Among Bandits: No-Regret Learning for Online Fair Division

Honor Among Bandits: No-Regret Learning for Online Fair Division

URL: http://arxiv.org/abs/2407.01795v2
Date: Sat, 17 Aug 2024 01:53:00 GMT
Title: Honor Among Bandits: No-Regret Learning for Online Fair Division
Authors: Ariel D. Procaccia, Benjamin Schiffer, Shirley Zhang,
Abstract summary: We consider the problem of online fair division of indivisible goods to players when there are a finite number of types of goods and player values are drawn from distributions with unknown means. Our main result is the design of an explore-then-commit algorithm that achieves $tildeO(T2/3)$ regret while maintaining either fairness constraint.
Score: 20.38824614301761
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the problem of online fair division of indivisible goods to players when there are a finite number of types of goods and player values are drawn from distributions with unknown means. Our goal is to maximize social welfare subject to allocating the goods fairly in expectation. When a player's value for an item is unknown at the time of allocation, we show that this problem reduces to a variant of (stochastic) multi-armed bandits, where there exists an arm for each player's value for each type of good. At each time step, we choose a distribution over arms which determines how the next item is allocated. We consider two sets of fairness constraints for this problem: envy-freeness in expectation and proportionality in expectation. Our main result is the design of an explore-then-commit algorithm that achieves $\tilde{O}(T^{2/3})$ regret while maintaining either fairness constraint. This result relies on unique properties fundamental to fair-division constraints that allow faster rates of learning, despite the restricted action space. We also prove a lower bound of $\tilde{\Omega}(T^{2/3})$ regret for our setting, showing that our results are tight.

Related papers

Optimal Multi-Objective Best Arm Identification with Fixed Confidence [62.36929749450298]
We consider a multi-armed bandit setting in which each arm yields an $M$-dimensional vector reward upon selection.<n>The end goal is to identify the best arm of em every objective in the shortest (expected) time subject to an upper bound on the probability of error.<n>We propose an algorithm that uses the novel idea of em surrogate proportions to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step.
arXiv Detail & Related papers (2025-01-23T12:28:09Z)
Improved Regret Bounds for Online Fair Division with Bandit Learning [0.0]
We study online fair division when there are a finite number of item types and the player values for the items are drawn randomly from distributions with unknown means. In this setting, a sequence of indivisible items arrives according to a random online process, and each item must be allocated to a single player. We show that it is possible to with high probability guarantee proportionality constraint satisfaction and achieve $tildeO(sqrtT)$ regret.
arXiv Detail & Related papers (2025-01-13T02:48:58Z)
Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits. In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z)
Imprecise Multi-Armed Bandits [0.0]
We introduce a novel multi-armed bandit framework, where each arm is associated with a fixed unknown credal set over the space of outcomes. We then define a notion of regret corresponding to the lower prevision defined by these credal sets.
arXiv Detail & Related papers (2024-05-09T10:58:40Z)
Trading-off price for data quality to achieve fair online allocation [25.154957931903525]
We consider the problem of online allocation subject to a long-term fairness penalty. We propose an algorithm that jointly solves both problems and show that it has a regret bounded by $mathcalO(sqrtT)$.
arXiv Detail & Related papers (2023-06-23T11:09:43Z)
Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning [105.7510838453122]
We examine online safe multi-agent reinforcement learning using constrained Markov games. We develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem. Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step.
arXiv Detail & Related papers (2023-05-31T22:09:24Z)
The price of unfairness in linear bandits with biased feedback [62.25313751895011]
We study the problem of sequential decision making with biased linear bandit feedback. We show that the worst case regret is higher than the dT 1/2 log(T) regret rate obtained under unbiased feedback. Interestingly, the gap-dependent rates reveal the existence of non-trivial instances where the problem is no more difficult than its unbiased counterpart.
arXiv Detail & Related papers (2022-03-18T08:03:20Z)
Bandit problems with fidelity rewards [7.154621689269006]
The fidelity bandits problem is a variant of the $K$-armed bandit problem in which the reward of each arm is augmented by a fidelity reward depending on how 'loyal' the player has been to that arm in the past. In the loyalty-points model the amount of extra reward depends on the number of times the arm has previously been played. In the subscription model the additional reward depends on the current number of consecutive draws of the arm.
arXiv Detail & Related papers (2021-11-25T11:09:43Z)
Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms. We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z)
Multitask Bandit Learning Through Heterogeneous Feedback Aggregation [35.923544685900055]
We formulate the problem as the $epsilon$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms. We develop an upper confidence bound-based algorithm, RobustAgg$(epsilon)$, that adaptively aggregates rewards collected by different players.
arXiv Detail & Related papers (2020-10-29T07:13:28Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
Tight Lower Bounds for Combinatorial Multi-Armed Bandits [72.56064196252498]
The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round. We show that the recently proposed Gini-weighted smoothness parameter determines the lower bounds for monotone reward functions.
arXiv Detail & Related papers (2020-02-13T08:53:43Z)
Selfish Robustness and Equilibria in Multi-Player Bandits [25.67398941667429]
In a game, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time. While the cooperative case where players maximize the collective reward has been mostly considered, to malicious players is a crucial and challenging concern. We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare.
arXiv Detail & Related papers (2020-02-04T09:50:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.