Optimal and Fair Encouragement Policy Evaluation and Learning
- URL: http://arxiv.org/abs/2309.07176v2
- Date: Sat, 25 Nov 2023 02:54:49 GMT
- Title: Optimal and Fair Encouragement Policy Evaluation and Learning
- Authors: Angela Zhou
- Abstract summary: We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules.
We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds.
- Score: 11.712023983596914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In consequential domains, it is often impossible to compel individuals to
take treatment, so that optimal policy rules are merely suggestions in the
presence of human non-adherence to treatment recommendations. In these same
domains, there may be heterogeneity both in who responds in taking-up
treatment, and heterogeneity in treatment efficacy. While optimal treatment
rules can maximize causal outcomes across the population, access parity
constraints or other fairness considerations can be relevant in the case of
encouragement. For example, in social services, a persistent puzzle is the gap
in take-up of beneficial services among those who may benefit from them the
most. When in addition the decision-maker has distributional preferences over
both access and average outcomes, the optimal decision rule changes. We study
causal identification, statistical variance-reduced estimation, and robust
estimation of optimal treatment rules, including under potential violations of
positivity. We consider fairness constraints such as demographic parity in
treatment take-up, and other constraints, via constrained optimization. Our
framework can be extended to handle algorithmic recommendations under an
often-reasonable covariate-conditional exclusion restriction, using our
robustness checks for lack of positivity in the recommendation. We develop a
two-stage algorithm for solving over parametrized policy classes under general
constraints to obtain variance-sensitive regret bounds. We illustrate the
methods in two case studies based on data from randomized encouragement to
enroll in insurance and from pretrial supervised release with electronic
monitoring.
Related papers
- Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing [13.34215548232296]
Counterfactual fairness (CF) offers a promising statistical tool grounded in causal inference to formulate and study fairness.
We theoretically characterize the optimal CF policy and prove its stationarity, which greatly simplifies the search for optimal CF policies.
We prove and then validate our policy learning approach in controlling unfairness and attaining optimal value through simulations.
arXiv Detail & Related papers (2025-01-10T22:27:44Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Reduced-Rank Multi-objective Policy Learning and Optimization [57.978477569678844]
In practice, causal researchers do not have a single outcome in mind a priori.
In government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty.
We present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning.
arXiv Detail & Related papers (2024-04-29T08:16:30Z) - Privacy Preserving Adaptive Experiment Design [13.839525385976303]
We investigate the tradeoff between loss of social welfare and statistical power in contextual bandit experiment.
We propose differentially private algorithms which still matches the lower bound, showing that privacy is "almost free"
arXiv Detail & Related papers (2024-01-16T09:22:12Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - Policy Learning with Distributional Welfare [1.0742675209112622]
Most literature on treatment choice has considered utilitarian welfare based on the conditional average treatment effect (ATE)
This paper proposes an optimal policy that allocates the treatment based on the conditional quantile of individual treatment effects (QoTE)
arXiv Detail & Related papers (2023-11-27T14:51:30Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Stage-Aware Learning for Dynamic Treatments [3.6923632650826486]
We propose a novel individualized learning method for dynamic treatment regimes.
By relaxing the restriction that the observed trajectory must be fully aligned with the optimal treatments, our approach substantially improves the sample efficiency and stability of IPWE-based methods.
arXiv Detail & Related papers (2023-10-30T06:35:31Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - Instance-Dependent Confidence and Early Stopping for Reinforcement
Learning [99.57168572237421]
Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure.
This research provides guarantees that explain textitex post the performance differences observed.
A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice.
arXiv Detail & Related papers (2022-01-21T04:25:35Z) - Treatment recommendation with distributional targets [0.0]
We study the problem of a decision maker who must provide the best possible treatment recommendation based on an experiment.
The desirability of the outcome distribution resulting from the treatment recommendation is measured through a functional distributional characteristic.
We propose two (near)-optimal regret policies.
arXiv Detail & Related papers (2020-05-19T19:27:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.