Related papers: Optimal and Fair Encouragement Policy Evaluation and Learning

Optimal and Fair Encouragement Policy Evaluation and Learning

URL: http://arxiv.org/abs/2309.07176v2
Date: Sat, 25 Nov 2023 02:54:49 GMT
Title: Optimal and Fair Encouragement Policy Evaluation and Learning
Authors: Angela Zhou
Abstract summary: We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules. We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds.
Score: 11.712023983596914
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. In these same domains, there may be heterogeneity both in who responds in taking-up treatment, and heterogeneity in treatment efficacy. While optimal treatment rules can maximize causal outcomes across the population, access parity constraints or other fairness considerations can be relevant in the case of encouragement. For example, in social services, a persistent puzzle is the gap in take-up of beneficial services among those who may benefit from them the most. When in addition the decision-maker has distributional preferences over both access and average outcomes, the optimal decision rule changes. We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules, including under potential violations of positivity. We consider fairness constraints such as demographic parity in treatment take-up, and other constraints, via constrained optimization. Our framework can be extended to handle algorithmic recommendations under an often-reasonable covariate-conditional exclusion restriction, using our robustness checks for lack of positivity in the recommendation. We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds. We illustrate the methods in two case studies based on data from randomized encouragement to enroll in insurance and from pretrial supervised release with electronic monitoring.

Related papers

Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing [13.34215548232296]
Counterfactual fairness (CF) offers a promising statistical tool grounded in causal inference to formulate and study fairness. We theoretically characterize the optimal CF policy and prove its stationarity, which greatly simplifies the search for optimal CF policies. We prove and then validate our policy learning approach in controlling unfairness and attaining optimal value through simulations.
arXiv Detail & Related papers (2025-01-10T22:27:44Z)
Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric. We propose a single framework built on their equivalence in learning scenarios. Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z)
Reduced-Rank Multi-objective Policy Learning and Optimization [57.978477569678844]
In practice, causal researchers do not have a single outcome in mind a priori. In government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty. We present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning.
arXiv Detail & Related papers (2024-04-29T08:16:30Z)
Privacy Preserving Adaptive Experiment Design [13.839525385976303]
We investigate the tradeoff between loss of social welfare and statistical power in contextual bandit experiment. We propose differentially private algorithms which still matches the lower bound, showing that privacy is "almost free"
arXiv Detail & Related papers (2024-01-16T09:22:12Z)
Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study. It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction. We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z)
Policy Learning with Distributional Welfare [1.0742675209112622]
Most literature on treatment choice has considered utilitarian welfare based on the conditional average treatment effect (ATE) This paper proposes an optimal policy that allocates the treatment based on the conditional quantile of individual treatment effects (QoTE)
arXiv Detail & Related papers (2023-11-27T14:51:30Z)
Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences. Our method is especially suitable for problems with well-specified likelihoods. We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z)
Stage-Aware Learning for Dynamic Treatments [3.6923632650826486]
We propose a novel individualized learning method for dynamic treatment regimes. By relaxing the restriction that the observed trajectory must be fully aligned with the optimal treatments, our approach substantially improves the sample efficiency and stability of IPWE-based methods.
arXiv Detail & Related papers (2023-10-30T06:35:31Z)
Inference for relative sparsity [0.0]
We develop inference for the relative sparsity objective function, because characterizing uncertainty is crucial to applications in medicine. Inference is difficult, because the relative sparsity objective depends on the unpenalized value function, which is unstable and has infinite estimands in the binary action case. To tackle these issues, we nest a weighted Trust Region Policy Optimization function within a relative sparsity objective, implement an adaptive relative sparsity penalty, and propose a sample-splitting framework for post-selection inference.
arXiv Detail & Related papers (2023-06-25T17:14:45Z)
Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT. We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z)
Instance-Dependent Confidence and Early Stopping for Reinforcement Learning [99.57168572237421]
Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. This research provides guarantees that explain textitex post the performance differences observed. A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice.
arXiv Detail & Related papers (2022-01-21T04:25:35Z)
Treatment recommendation with distributional targets [0.0]
We study the problem of a decision maker who must provide the best possible treatment recommendation based on an experiment. The desirability of the outcome distribution resulting from the treatment recommendation is measured through a functional distributional characteristic. We propose two (near)-optimal regret policies.
arXiv Detail & Related papers (2020-05-19T19:27:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.