Optimal and Fair Encouragement Policy Evaluation and Learning
- URL: http://arxiv.org/abs/2309.07176v2
- Date: Sat, 25 Nov 2023 02:54:49 GMT
- Title: Optimal and Fair Encouragement Policy Evaluation and Learning
- Authors: Angela Zhou
- Abstract summary: We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules.
We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds.
- Score: 11.712023983596914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In consequential domains, it is often impossible to compel individuals to
take treatment, so that optimal policy rules are merely suggestions in the
presence of human non-adherence to treatment recommendations. In these same
domains, there may be heterogeneity both in who responds in taking-up
treatment, and heterogeneity in treatment efficacy. While optimal treatment
rules can maximize causal outcomes across the population, access parity
constraints or other fairness considerations can be relevant in the case of
encouragement. For example, in social services, a persistent puzzle is the gap
in take-up of beneficial services among those who may benefit from them the
most. When in addition the decision-maker has distributional preferences over
both access and average outcomes, the optimal decision rule changes. We study
causal identification, statistical variance-reduced estimation, and robust
estimation of optimal treatment rules, including under potential violations of
positivity. We consider fairness constraints such as demographic parity in
treatment take-up, and other constraints, via constrained optimization. Our
framework can be extended to handle algorithmic recommendations under an
often-reasonable covariate-conditional exclusion restriction, using our
robustness checks for lack of positivity in the recommendation. We develop a
two-stage algorithm for solving over parametrized policy classes under general
constraints to obtain variance-sensitive regret bounds. We illustrate the
methods in two case studies based on data from randomized encouragement to
enroll in insurance and from pretrial supervised release with electronic
monitoring.
Related papers
- Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Reduced-Rank Multi-objective Policy Learning and Optimization [57.978477569678844]
In practice, causal researchers do not have a single outcome in mind a priori.
In government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty.
We present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning.
arXiv Detail & Related papers (2024-04-29T08:16:30Z) - Privacy Preserving Adaptive Experiment Design [13.839525385976303]
We investigate the tradeoff between loss of social welfare and statistical power in contextual bandit experiment.
We propose differentially private algorithms which still matches the lower bound, showing that privacy is "almost free"
arXiv Detail & Related papers (2024-01-16T09:22:12Z) - Policy Learning with Distributional Welfare [1.0742675209112622]
Most literature on treatment choice has considered utilitarian welfare based on the conditional average treatment effect (ATE)
This paper proposes an optimal policy that allocates the treatment based on the conditional quantile of individual treatment effects (QoTE)
arXiv Detail & Related papers (2023-11-27T14:51:30Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Inference for relative sparsity [0.0]
We develop inference for the relative sparsity objective function, because characterizing uncertainty is crucial to applications in medicine.
Inference is difficult, because the relative sparsity objective depends on the unpenalized value function, which is unstable and has infinite estimands in the binary action case.
To tackle these issues, we nest a weighted Trust Region Policy Optimization function within a relative sparsity objective, implement an adaptive relative sparsity penalty, and propose a sample-splitting framework for post-selection inference.
arXiv Detail & Related papers (2023-06-25T17:14:45Z) - Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback.
We consider the general reward setting where the reward can be defined over the whole trajectory.
We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - Policy Learning with Asymmetric Counterfactual Utilities [0.6138671548064356]
We consider optimal policy learning with asymmetric counterfactual utility functions.
We derive minimax decision rules by minimizing the maximum expected utility loss.
We show that one can learn minimax loss decision rules from observed data by solving intermediate classification problems.
arXiv Detail & Related papers (2022-06-21T15:44:49Z) - Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response.
We construct unbiased estimators for the policy-dependent estimand by a perturbation method.
We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z) - Treatment recommendation with distributional targets [0.0]
We study the problem of a decision maker who must provide the best possible treatment recommendation based on an experiment.
The desirability of the outcome distribution resulting from the treatment recommendation is measured through a functional distributional characteristic.
We propose two (near)-optimal regret policies.
arXiv Detail & Related papers (2020-05-19T19:27:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.