Towards Return Parity in Markov Decision Processes
- URL: http://arxiv.org/abs/2111.10476v1
- Date: Fri, 19 Nov 2021 23:25:38 GMT
- Title: Towards Return Parity in Markov Decision Processes
- Authors: Jianfeng Chi, Jian Shen, Xinyi Dai, Weinan Zhang, Yuan Tian, Han Zhao
- Abstract summary: We study a fairness problem in Markov decision processes (MDPs)
We propose return parity, a fairness notion that requires MDPs from different demographic groups to achieve the same expected rewards.
Motivated by our decomposition theorem, we propose algorithms to mitigate return disparity via learning a shared group policy with state visitation distributional alignment.
- Score: 36.96748490812215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Algorithmic decisions made by machine learning models in high-stakes domains
may have lasting impacts over time. Unfortunately, naive applications of
standard fairness criterion in static settings over temporal domains may lead
to delayed and adverse effects. To understand the dynamics of performance
disparity, we study a fairness problem in Markov decision processes (MDPs).
Specifically, we propose return parity, a fairness notion that requires MDPs
from different demographic groups that share the same state and action spaces
to achieve approximately the same expected time-discounted rewards. We first
provide a decomposition theorem for return disparity, which decomposes the
return disparity of any two MDPs into the distance between group-wise reward
functions, the discrepancy of group policies, and the discrepancy between state
visitation distributions induced by the group policies. Motivated by our
decomposition theorem, we propose algorithms to mitigate return disparity via
learning a shared group policy with state visitation distributional alignment
using integral probability metrics. We conduct experiments to corroborate our
results, showing that the proposed algorithm can successfully close the
disparity gap while maintaining the performance of policies on two real-world
recommender system benchmark datasets.
Related papers
- Fair Resource Allocation in Weakly Coupled Markov Decision Processes [3.824858358548714]
We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes.
We adopt a fairness definition using the generalized Gini function instead of the traditional utilitarian (total-sum) objective.
arXiv Detail & Related papers (2024-11-14T20:40:55Z) - Reconciling Heterogeneous Effects in Causal Inference [44.99833362998488]
We apply the Reconcile algorithm for model multiplicity in machine learning to reconcile heterogeneous effects in causal inference.
Our results have tangible implications for ensuring fair outcomes in high-stakes such as healthcare, insurance, and housing.
arXiv Detail & Related papers (2024-06-05T18:43:46Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment.
We first adopt a transformer-based method to learn policy embeddings.
Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Repairing Regressors for Fair Binary Classification at Any Decision
Threshold [8.322348511450366]
We show that we can increase fair performance across all thresholds at once.
We introduce a formal measure of Distributional Parity, which captures the degree of similarity in the distributions of classifications for different protected groups.
Our main result is to put forward a novel post-processing algorithm based on optimal transport, which provably maximizes Distributional Parity.
arXiv Detail & Related papers (2022-03-14T20:53:35Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - Risk-Sensitive Markov Decision Processes with Combined Metrics of Mean
and Variance [3.062772835338966]
This paper investigates the optimization problem of an infinite stage discrete time Markov decision process (MDP) with a long-run average metric.
A performance difference formula is derived and it can quantify the difference of the mean-variance combined metrics of MDPs under any two different policies.
A necessary condition of the optimal policy and the optimality of deterministic policies are derived.
arXiv Detail & Related papers (2020-08-09T10:35:35Z) - Learning Overlapping Representations for the Estimation of
Individualized Treatment Effects [97.42686600929211]
Estimating the likely outcome of alternatives from observational data is a challenging problem.
We show that algorithms that learn domain-invariant representations of inputs are often inappropriate.
We develop a deep kernel regression algorithm and posterior regularization framework that substantially outperforms the state-of-the-art on a variety of benchmarks data sets.
arXiv Detail & Related papers (2020-01-14T12:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.