Related papers: Fairness of Exposure in Online Restless Multi-armed Bandits

Fairness of Exposure in Online Restless Multi-armed Bandits

URL: http://arxiv.org/abs/2402.06348v1
Date: Fri, 9 Feb 2024 11:53:27 GMT
Title: Fairness of Exposure in Online Restless Multi-armed Bandits
Authors: Archit Sood, Shweta Jain and Sujit Gujar
Abstract summary: Restless multi-armed bandits (RMABs) generalize the multi-armed bandits where each arm exhibits Markovian behavior and transitions according to their transition dynamics. We show that our algorithm achieves sublinear fairness regret in the single pull case $O(sqrtTln T)$, with $T$ being the total number of episodes. Empirically, we show that our algorithm performs well in the multi-pull scenario as well.
Score: 8.071147275221973
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Restless multi-armed bandits (RMABs) generalize the multi-armed bandits where each arm exhibits Markovian behavior and transitions according to their transition dynamics. Solutions to RMAB exist for both offline and online cases. However, they do not consider the distribution of pulls among the arms. Studies have shown that optimal policies lead to unfairness, where some arms are not exposed enough. Existing works in fairness in RMABs focus heavily on the offline case, which diminishes their application in real-world scenarios where the environment is largely unknown. In the online scenario, we propose the first fair RMAB framework, where each arm receives pulls in proportion to its merit. We define the merit of an arm as a function of its stationary reward distribution. We prove that our algorithm achieves sublinear fairness regret in the single pull case $O(\sqrt{T\ln T})$, with $T$ being the total number of episodes. Empirically, we show that our algorithm performs well in the multi-pull scenario as well.

Related papers

Finite-Horizon Single-Pull Restless Bandits: An Efficient Index Policy For Scarce Resource Allocation [33.11114874824768]
We introduce Finite-Horizon Single-Pull RMABs, a novel variant in which each arm can only be pulled once. We propose using dummy states to duplicate the system, ensuring that once an arm is activated, it transitions exclusively within the dummy states. For the first time, we demonstrate that our index policy achieves a sub-linearly decaying average optimality gap of $tildemathcalOleft(frac1rho1/2right)$ for a finite number of arms.
arXiv Detail & Related papers (2025-01-10T16:54:56Z)
Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits. In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z)
A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback. We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
arXiv Detail & Related papers (2024-09-08T12:19:12Z)
Global Rewards in Restless Multi-Armed Bandits [37.918982196934216]
Restless multi-armed bandits (RMAB) extend multi-armed bandits so pulling an arm impacts future states. Despite the success of RMABs, a key limiting assumption is the separability of rewards into a sum across arms. We propose restless-multi-armed bandit with global rewards (RMAB-G), a generalization of RMABs to global non-separable rewards.
arXiv Detail & Related papers (2024-06-02T13:13:46Z)
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds. We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$. Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z)
Online Restless Multi-Armed Bandits with Long-Term Fairness Constraints [17.403031677689427]
We introduce a new RMAB model with "long-term fairness constraints" For the online RMAB-F setting, the underlying MDPs associated with each arm are unknown to the DM. We prove that Fair-UCRL ensures probabilistic sublinear bounds on both the reward regret and the fairness violation regret.
arXiv Detail & Related papers (2023-12-16T03:35:56Z)
Best Arm Identification in Restless Markov Multi-Armed Bandits [85.55466536537293]
We study the problem of identifying the best arm in a multi-armed bandit environment. A decision entity wishes to find the index of the best arm as quickly as possible, subject to an upper bound error probability. We show that this policy achieves an upper bound that depends on $R$ and is monotonically non-increasing as $Rtoinfty$.
arXiv Detail & Related papers (2022-03-29T04:58:04Z)
Learning in Restless Bandits under Exogenous Global Markov Process [13.836565669337057]
We consider an extension to the restless multi-armed bandit (RMAB) problem with unknown arm dynamics. Under each global state, the rewards process of each arm evolves according to an unknown Markovian rule. We develop the Learning under Exogenous Markov Process (LEMP) algorithm to minimize regret.
arXiv Detail & Related papers (2021-12-17T12:47:30Z)
Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms. We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z)
Fair Algorithms for Multi-Agent Multi-Armed Bandits [29.68201160277817]
We propose a multi-agent variant of the classical multi-armed bandit problem. The goal is not to learn the "best arm"; indeed, each agent may perceive a different arm to be the best for her personally. We show that multi-agent variants of three classic multi-armed bandit algorithms achieve sublinear regret.
arXiv Detail & Related papers (2020-07-13T21:20:04Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.