Related papers: Towards Soft Fairness in Restless Multi-Armed Bandits

Towards Soft Fairness in Restless Multi-Armed Bandits

URL: http://arxiv.org/abs/2207.13343v1
Date: Wed, 27 Jul 2022 07:56:32 GMT
Title: Towards Soft Fairness in Restless Multi-Armed Bandits
Authors: Dexun Li, Pradeep Varakantham
Abstract summary: Restless multi-armed bandits (RMAB) is a framework for allocating limited resources under uncertainty. To avoid starvation in the executed interventions across individuals/regions/communities, we first provide a soft fairness constraint. We then provide an approach to enforce the soft fairness constraint in RMABs.
Score: 8.140037969280716
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Restless multi-armed bandits (RMAB) is a framework for allocating limited resources under uncertainty. It is an extremely useful model for monitoring beneficiaries and executing timely interventions to ensure maximum benefit in public health settings (e.g., ensuring patients take medicines in tuberculosis settings, ensuring pregnant mothers listen to automated calls about good pregnancy practices). Due to the limited resources, typically certain communities or regions are starved of interventions that can have follow-on effects. To avoid starvation in the executed interventions across individuals/regions/communities, we first provide a soft fairness constraint and then provide an approach to enforce the soft fairness constraint in RMABs. The soft fairness constraint requires that an algorithm never probabilistically favor one arm over another if the long-term cumulative reward of choosing the latter arm is higher. Our approach incorporates softmax based value iteration method in the RMAB setting to design selection algorithms that manage to satisfy the proposed fairness constraint. Our method, referred to as SoftFair, also provides theoretical performance guarantees and is asymptotically optimal. Finally, we demonstrate the utility of our approaches on simulated benchmarks and show that the soft fairness constraint can be handled without a significant sacrifice on value.

Related papers

Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback. This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc. Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z)
IRL for Restless Multi-Armed Bandits with Applications in Maternal and Child Health [52.79219652923714]
This paper is the first to present the use of inverse reinforcement learning (IRL) to learn desired rewards for RMABs. We demonstrate improved outcomes in a maternal and child health telehealth program.
arXiv Detail & Related papers (2024-12-11T15:28:04Z)
Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible. We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost. Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
f-FERM: A Scalable Framework for Robust Fair Empirical Risk Minimization [9.591164070876689]
This paper presents a unified optimization framework for fair empirical risk based on f-divergence measures (f-FERM) In addition, our experiments demonstrate the superiority of fairness-accuracy tradeoffs offered by f-FERM for almost all batch sizes. Our extension is based on a distributionally robust optimization reformulation of f-FERM objective under $L_p$ norms as uncertainty sets.
arXiv Detail & Related papers (2023-12-06T03:14:16Z)
Implicitly normalized forecaster with clipping for linear and non-linear heavy-tailed multi-armed bandits [85.27420062094086]
Implicitly Normalized Forecaster (INF) is considered an optimal solution for adversarial multi-armed bandit (MAB) problems. We propose a new version of INF called the Implicitly Normalized Forecaster with clipping (INFclip) for MAB problems with heavy-tailed settings. We demonstrate that INFclip is optimal for linear heavy-tailed MAB problems and works well for non-linear ones.
arXiv Detail & Related papers (2023-05-11T12:00:43Z)
Indexability is Not Enough for Whittle: Improved, Near-Optimal Algorithms for Restless Bandits [30.532795983761314]
We study the problem of planning restless multi-armed bandits (RMABs) with multiple actions. We first show that Whittle index policies can fail in simple and practically relevant settings. We then propose an alternate planning algorithm based on the mean-field method.
arXiv Detail & Related papers (2022-10-31T19:35:15Z)
Efficient Resource Allocation with Fairness Constraints in Restless Multi-Armed Bandits [8.140037969280716]
Restless Multi-Armed Bandits (RMAB) is an apt model to represent decision-making problems in public health interventions. In this paper, we are interested in ensuring that RMAB decision making is also fair to different arms while maximizing expected value.
arXiv Detail & Related papers (2022-06-08T13:28:29Z)
Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family. We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z)
Planning to Fairly Allocate: Probabilistic Fairness in the Restless Bandit Setting [30.120134596715154]
We introduce ProbFair, a probabilistically fair policy that maximizes total expected reward and satisfies a budget constraint. We evaluate our algorithm on a real-world application, where interventions support continuous positive airway pressure (CPAP) therapy adherence among patients.
arXiv Detail & Related papers (2021-06-14T18:01:08Z)
Collapsing Bandits and Their Application to Public Health Interventions [45.45852113386041]
Collpasing Bandits is a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process. We build on the Whittle index technique for RMABs to derive conditions under which the Collapsing Bandits problem is indexable. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques.
arXiv Detail & Related papers (2020-07-05T00:33:30Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.