Learning Optimal and Fair Policies for Online Allocation of Scarce
Societal Resources from Data Collected in Deployment
- URL: http://arxiv.org/abs/2311.13765v1
- Date: Thu, 23 Nov 2023 01:40:41 GMT
- Title: Learning Optimal and Fair Policies for Online Allocation of Scarce
Societal Resources from Data Collected in Deployment
- Authors: Bill Tang, \c{C}a\u{g}{\i}l Ko\c{c}yi\u{g}it, Eric Rice, Phebe Vayanos
- Abstract summary: We use administrative data collected in deployment to design an online policy that maximizes expected outcomes while satisfying budget constraints.
We show that using our policies improves rates of exit from homelessness by 1.9% and that policies that are fair in either allocation or outcomes by race come at a very low price of fairness.
- Score: 5.0904557821667
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of allocating scarce societal resources of different
types (e.g., permanent housing, deceased donor kidneys for transplantation,
ventilators) to heterogeneous allocatees on a waitlist (e.g., people
experiencing homelessness, individuals suffering from end-stage renal disease,
Covid-19 patients) based on their observed covariates. We leverage
administrative data collected in deployment to design an online policy that
maximizes expected outcomes while satisfying budget constraints, in the long
run. Our proposed policy waitlists each individual for the resource maximizing
the difference between their estimated mean treatment outcome and the estimated
resource dual-price or, roughly, the opportunity cost of using the resource.
Resources are then allocated as they arrive, in a first-come first-serve
fashion. We demonstrate that our data-driven policy almost surely
asymptotically achieves the expected outcome of the optimal out-of-sample
policy under mild technical assumptions. We extend our framework to incorporate
various fairness constraints. We evaluate the performance of our approach on
the problem of designing policies for allocating scarce housing resources to
people experiencing homelessness in Los Angeles based on data from the homeless
management information system. In particular, we show that using our policies
improves rates of exit from homelessness by 1.9% and that policies that are
fair in either allocation or outcomes by race come at a very low price of
fairness.
Related papers
- Policy Learning with Distributional Welfare [1.0742675209112622]
Most literature on treatment choice has considered utilitarian welfare based on the conditional average treatment effect (ATE)
This paper proposes an optimal policy that allocates the treatment based on the conditional quantile of individual treatment effects (QoTE)
arXiv Detail & Related papers (2023-11-27T14:51:30Z) - Deep Reinforcement Learning for Efficient and Fair Allocation of Health Care Resources [47.57108369791273]
Scarcity of health care resources could result in the unavoidable consequence of rationing.
There is no universally accepted standard for health care resource allocation protocols.
We propose a transformer-based deep Q-network to integrate the disease progression of individual patients and the interaction effects among patients.
arXiv Detail & Related papers (2023-09-15T17:28:06Z) - Bayesian Inverse Transition Learning for Offline Settings [30.10905852013852]
Reinforcement learning is commonly used for sequential decision-making in domains such as healthcare and education.
We propose a new constraint-based approach that captures our desiderata for reliably learning a posterior distribution of the transition dynamics $T$.
Our results demonstrate that by using our constraints, we learn a high-performing policy, while considerably reducing the policy's variance over different datasets.
arXiv Detail & Related papers (2023-08-09T17:08:29Z) - A Unified Framework of Policy Learning for Contextual Bandit with
Confounding Bias and Missing Observations [108.89353070722497]
We study the offline contextual bandit problem, where we aim to acquire an optimal policy using observational data.
We present a new algorithm called Causal-Adjusted Pessimistic (CAP) policy learning, which forms the reward function as the solution of an integral equation system.
arXiv Detail & Related papers (2023-03-20T15:17:31Z) - A Risk-Sensitive Approach to Policy Optimization [21.684251937825234]
Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy.
We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized.
We demonstrate that the use of moderately "pessimistic" risk profiles, which emphasize scenarios where the agent performs poorly, leads to enhanced exploration and a continual focus on addressing deficiencies.
arXiv Detail & Related papers (2022-08-19T00:55:05Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - Learning Resource Allocation Policies from Observational Data with an
Application to Homeless Services Delivery [9.65131987576314]
We study the problem of learning, from observational data, fair and interpretable policies that effectively match heterogeneous individuals to scarce resources of different types.
We conduct extensive analyses using synthetic and real-world data.
arXiv Detail & Related papers (2022-01-25T02:32:55Z) - Building a Foundation for Data-Driven, Interpretable, and Robust Policy
Design using the AI Economist [67.08543240320756]
We show that the AI Economist framework enables effective, flexible, and interpretable policy design using two-level reinforcement learning and data-driven simulations.
We find that log-linear policies trained using RL significantly improve social welfare, based on both public health and economic outcomes, compared to past outcomes.
arXiv Detail & Related papers (2021-08-06T01:30:41Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Coordinated Online Learning for Multi-Agent Systems with Coupled
Constraints and Perturbed Utility Observations [91.02019381927236]
We introduce a novel method to steer the agents toward a stable population state, fulfilling the given resource constraints.
The proposed method is a decentralized resource pricing method based on the resource loads resulting from the augmentation of the game's Lagrangian.
arXiv Detail & Related papers (2020-10-21T10:11:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.