Learning Optimal Individualized Decision Rules with Conditional Demographic Parity
- URL: http://arxiv.org/abs/2603.05226v1
- Date: Thu, 05 Mar 2026 14:39:32 GMT
- Title: Learning Optimal Individualized Decision Rules with Conditional Demographic Parity
- Authors: Wenhai Cui, Wen Su, Donglin Zeng, Xingqiu Zhao,
- Abstract summary: We propose a novel framework that incorporates demographic parity (DP) and conditional demographic parity (CDP) constraints into the estimation of optimal IDRs.<n>We show that the theoretically optimal IDRs under DP and CDP constraints can be obtained by applying perturbations to the unconstrained optimal IDRs.
- Score: 7.125803218132866
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Individualized decision rules (IDRs) have become increasingly prevalent in societal applications such as personalized marketing, healthcare, and public policy design. However, a critical ethical concern arises from the potential discriminatory effects of IDRs trained on biased data. These algorithms may disproportionately harm individuals from minority subgroups defined by sensitive attributes like gender, race, or language. To address this issue, we propose a novel framework that incorporates demographic parity (DP) and conditional demographic parity (CDP) constraints into the estimation of optimal IDRs. We show that the theoretically optimal IDRs under DP and CDP constraints can be obtained by applying perturbations to the unconstrained optimal IDRs, enabling a computationally efficient solution. Theoretically, we derive convergence rates for both policy value and the fairness constraint term. The effectiveness of our methods is illustrated through comprehensive simulation studies and an empirical application to the Oregon Health Insurance Experiment.
Related papers
- Locally Private Nonparametric Contextual Multi-armed Bandits [10.579415536953132]
We address the challenge of nonparametric contextual multi-armed bandits (MAB) under local differential privacy (LDP)<n>We develop a uniform-confidence-bound-type estimator, showing its minimax optimality supported by a matching minimax lower bound.
arXiv Detail & Related papers (2025-03-11T07:00:57Z) - Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing [13.34215548232296]
Counterfactual fairness (CF) offers a promising statistical tool grounded in causal inference to formulate and study fairness.<n>We theoretically characterize the optimal CF policy and prove its stationarity, which greatly simplifies the search for optimal CF policies.<n>We prove and then validate our policy learning approach in controlling unfairness and attaining optimal value through simulations.
arXiv Detail & Related papers (2025-01-10T22:27:44Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Optimal and Fair Encouragement Policy Evaluation and Learning [9.036025934093963]
We study causal identification and robust estimation of optimal treatment rules, including under potential violations of positivity.<n>We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds.<n>We illustrate the methods in three case studies based on data from reminders of SNAP benefits, randomized encouragement to enroll in insurance, and from pretrial supervised release with electronic monitoring.
arXiv Detail & Related papers (2023-09-12T20:45:30Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule.<n>Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded.<n>We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z) - Federated Offline Reinforcement Learning [55.326673977320574]
We propose a multi-site Markov decision process model that allows for both homogeneous and heterogeneous effects across sites.
We design the first federated policy optimization algorithm for offline RL with sample complexity.
We give a theoretical guarantee for the proposed algorithm, where the suboptimality for the learned policies is comparable to the rate as if data is not distributed.
arXiv Detail & Related papers (2022-06-11T18:03:26Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking.
They can both improve outcomes for study participants and increase the chance of identifying good or even best policies.
To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z) - Minimax Pareto Fairness: A Multi Objective Perspective [24.600419295290504]
Group fairness is a multi-objective optimization problem, where each sensitive group risk is a separate objective.
We provide a simple algorithm compatible with deep neural networks to satisfy these constraints.
We test the proposed methodology on real case-studies of predicting income, ICU patient mortality, skin lesions classification, and assessing credit risk.
arXiv Detail & Related papers (2020-11-03T16:21:53Z) - Fair Policy Targeting [0.6091702876917281]
One of the major concerns of targeting interventions on individuals in social welfare programs is discrimination.
This paper addresses the question of the design of fair and efficient treatment allocation rules.
arXiv Detail & Related papers (2020-05-25T20:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.