State-Visitation Fairness in Average-Reward MDPs
- URL: http://arxiv.org/abs/2102.07120v1
- Date: Sun, 14 Feb 2021 10:20:53 GMT
- Title: State-Visitation Fairness in Average-Reward MDPs
- Authors: Ganesh Ghalme, Vineet Nair, Vishakha Patil, Yilun Zhou
- Abstract summary: We study fairness in temporally extended decision-making settings, specifically those formulated as Markov Decision Processes (MDPs)
Our proposed notion of fairness ensures that each state's long-term visitation frequency is more than a specified fraction.
The proposed solution guarantees a simultaneous approximation on the expected average-reward and the long-term state-visitation frequency.
- Score: 5.190207094732672
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fairness has emerged as an important concern in automated decision-making in
recent years, especially when these decisions affect human welfare. In this
work, we study fairness in temporally extended decision-making settings,
specifically those formulated as Markov Decision Processes (MDPs). Our proposed
notion of fairness ensures that each state's long-term visitation frequency is
more than a specified fraction. In an average-reward MDP (AMDP) setting, we
formulate the problem as a bilinear saddle point program and, for a generative
model, solve it using a Stochastic Mirror Descent (SMD) based algorithm. The
proposed solution guarantees a simultaneous approximation on the expected
average-reward and the long-term state-visitation frequency. We validate our
theoretical results with experiments on synthetic data.
Related papers
- Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs)
In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z) - Offline Bayesian Aleatoric and Epistemic Uncertainty Quantification and Posterior Value Optimisation in Finite-State MDPs [3.1139806580181006]
We address the challenge of quantifying Bayesian uncertainty in offline use cases of finite-state Markov Decision Processes (MDPs) with unknown dynamics.
We use standard Bayesian reinforcement learning methods to capture the posterior uncertainty in MDP parameters.
We then analytically compute the first two moments of the return distribution across posterior samples and apply the law of total variance.
We highlight the real-world impact and computational scalability of our method by applying it to the AI Clinician problem.
arXiv Detail & Related papers (2024-06-04T16:21:14Z) - Long-Term Fair Decision Making through Deep Generative Models [12.333165351086171]
This paper studies long-term fair machine learning which aims to mitigate group disparity over the long term in sequential decision-making systems.
We leverage the temporal causal graph and use the 1-Wasserstein distance between the interventional distributions of different demographic groups at a sufficiently large time step as the quantitative metric.
We propose a three-phase learning framework where the decision model is trained on high-fidelity data generated by a deep generative model.
arXiv Detail & Related papers (2024-01-20T17:44:50Z) - Online POMDP Planning with Anytime Deterministic Guarantees [11.157761902108692]
Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs)
Finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks.
We derive a deterministic relationship between a simplified solution that is easier to obtain and the theoretically optimal one.
arXiv Detail & Related papers (2023-10-03T04:40:38Z) - Continuous-Time Modeling of Counterfactual Outcomes Using Neural
Controlled Differential Equations [84.42837346400151]
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare.
Existing causal inference approaches consider regular, discrete-time intervals between observations and treatment decisions.
We propose a controllable simulation environment based on a model of tumor growth for a range of scenarios.
arXiv Detail & Related papers (2022-06-16T17:15:15Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Achieving Long-Term Fairness in Sequential Decision Making [9.046461405943502]
We propose a framework for achieving long-term fair sequential decision making.
We take path-specific effects on the time-lagged causal graph as a quantitative tool for measuring long-term fairness.
arXiv Detail & Related papers (2022-04-04T20:05:44Z) - Solving the non-preemptive two queue polling model with generally
distributed service and switch-over durations and Poisson arrivals as a
Semi-Markov Decision Process [0.0]
The polling system with switch-over durations is a useful model with several practical applications.
It is classified as a Discrete Event Dynamic System (DEDS) for which no one agreed upon modelling approach exists.
This paper presents a Semi-Markov Decision Process (SMDP) formulation of the polling system as to introduce additional modelling power.
arXiv Detail & Related papers (2021-12-13T11:40:55Z) - Efficient semidefinite-programming-based inference for binary and
multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF.
We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z) - A maximum-entropy approach to off-policy evaluation in average-reward
MDPs [54.967872716145656]
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs)
We provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases.
We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning.
arXiv Detail & Related papers (2020-06-17T18:13:37Z) - Planning in Markov Decision Processes with Gap-Dependent Sample
Complexity [48.98199700043158]
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process.
We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability.
arXiv Detail & Related papers (2020-06-10T15:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.