Learning treatment effects while treating those in need
- URL: http://arxiv.org/abs/2407.07596v1
- Date: Wed, 10 Jul 2024 12:29:46 GMT
- Title: Learning treatment effects while treating those in need
- Authors: Bryan Wilder, Pim Welle,
- Abstract summary: We propose a framework to design randomized allocation rules which optimally balance targeting high-need individuals with learning treatment effects.
We apply our framework to data from human services in Allegheny County, Pennsylvania.
- Score: 20.99198458867724
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many social programs attempt to allocate scarce resources to people with the greatest need. Indeed, public services increasingly use algorithmic risk assessments motivated by this goal. However, targeting the highest-need recipients often conflicts with attempting to evaluate the causal effect of the program as a whole, as the best evaluations would be obtained by randomizing the allocation. We propose a framework to design randomized allocation rules which optimally balance targeting high-need individuals with learning treatment effects, presenting policymakers with a Pareto frontier between the two goals. We give sample complexity guarantees for the policy learning problem and provide a computationally efficient strategy to implement it. We then apply our framework to data from human services in Allegheny County, Pennsylvania. Optimized policies can substantially mitigate the tradeoff between learning and targeting. For example, it is often possible to obtain 90% of the optimal utility in targeting high-need individuals while ensuring that the average treatment effect can be estimated with less than 2 times the samples that a randomized controlled trial would require. Mechanisms for targeting public services often focus on measuring need as accurately as possible. However, our results suggest that algorithmic systems in public services can be most impactful if they incorporate program evaluation as an explicit goal alongside targeting.
Related papers
- Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Reduced-Rank Multi-objective Policy Learning and Optimization [57.978477569678844]
In practice, causal researchers do not have a single outcome in mind a priori.
In government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty.
We present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning.
arXiv Detail & Related papers (2024-04-29T08:16:30Z) - Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data [102.16105233826917]
Learning from preference labels plays a crucial role in fine-tuning large language models.
There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learning.
arXiv Detail & Related papers (2024-04-22T17:20:18Z) - Machine Learning Who to Nudge: Causal vs Predictive Targeting in a Field Experiment on Student Financial Aid Renewal [5.044100238869374]
We analyze the value of targeting in a large-scale field experiment with over 53,000 college students.
We show that targeting based on low baseline outcomes is most effective in our specific application.
arXiv Detail & Related papers (2023-10-12T19:08:45Z) - Optimal and Fair Encouragement Policy Evaluation and Learning [11.712023983596914]
We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules.
We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds.
arXiv Detail & Related papers (2023-09-12T20:45:30Z) - Theoretically Principled Federated Learning for Balancing Privacy and
Utility [61.03993520243198]
We propose a general learning framework for the protection mechanisms that protects privacy via distorting model parameters.
It can achieve personalized utility-privacy trade-off for each model parameter, on each client, at each communication round in federated learning.
arXiv Detail & Related papers (2023-05-24T13:44:02Z) - Quasi-optimal Reinforcement Learning with Continuous Actions [8.17049210746654]
We develop a novel emphquasi-optimal learning algorithm, which can be easily optimized in off-policy settings.
We evaluate our algorithm with comprehensive simulated experiments and a dose suggestion real application to Ohio Type 1 diabetes dataset.
arXiv Detail & Related papers (2023-01-21T11:30:13Z) - Provably Efficient Algorithms for Multi-Objective Competitive RL [54.22598924633369]
We study multi-objective reinforcement learning (RL) where an agent's reward is represented as a vector.
In settings where an agent competes against opponents, its performance is measured by the distance of its average return vector to a target set.
We develop statistically and computationally efficient algorithms to approach the associated target set.
arXiv Detail & Related papers (2021-02-05T14:26:00Z) - Inherent Trade-offs in the Fair Allocation of Treatments [2.6143568807090696]
Explicit and implicit bias clouds human judgement, leading to discriminatory treatment of minority groups.
We propose a causal framework that learns optimal intervention policies from data subject to fairness constraints.
arXiv Detail & Related papers (2020-10-30T17:55:00Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.