(Machine) Learning What Policies Value
- URL: http://arxiv.org/abs/2206.00727v1
- Date: Wed, 1 Jun 2022 19:33:09 GMT
- Title: (Machine) Learning What Policies Value
- Authors: Daniel Bj\"orkegren, Joshua E. Blumenstock, Samsun Knight
- Abstract summary: This paper develops a method to uncover the values consistent with observed allocation decisions.
We use machine learning methods to estimate how much each individual benefits from an intervention.
We demonstrate this approach by analyzing Mexico's PROGRESA anti-poverty program.
- Score: 2.0267847227859144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When a policy prioritizes one person over another, is it because they benefit
more, or because they are preferred? This paper develops a method to uncover
the values consistent with observed allocation decisions. We use machine
learning methods to estimate how much each individual benefits from an
intervention, and then reconcile its allocation with (i) the welfare weights
assigned to different people; (ii) heterogeneous treatment effects of the
intervention; and (iii) weights on different outcomes. We demonstrate this
approach by analyzing Mexico's PROGRESA anti-poverty program. The analysis
reveals that while the program prioritized certain subgroups -- such as
indigenous households -- the fact that those groups benefited more implies that
they were in fact assigned a lower welfare weight. The PROGRESA case
illustrates how the method makes it possible to audit existing policies, and to
design future policies that better align with values.
Related papers
- Structural Interventions and the Dynamics of Inequality [0.0]
We show that technical solutions must be paired with external, context-aware interventions to enact social change.
This research highlights the ways that structural inequality can be perpetuated by seemingly unbiased decision mechanisms.
arXiv Detail & Related papers (2024-06-03T13:44:38Z) - Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples.
However, IS is employed in RL as a passive tool for re-weighting historical samples.
We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z) - Reduced-Rank Multi-objective Policy Learning and Optimization [57.978477569678844]
In practice, causal researchers do not have a single outcome in mind a priori.
In government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty.
We present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning.
arXiv Detail & Related papers (2024-04-29T08:16:30Z) - Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - Evaluating the Fairness of Discriminative Foundation Models in Computer
Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP)
We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy.
Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z) - Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment.
We first adopt a transformer-based method to learn policy embeddings.
Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z) - Identification of Subgroups With Similar Benefits in Off-Policy Policy
Evaluation [60.71312668265873]
We develop a method to balance the need for personalization with confident predictions.
We show that our method can be used to form accurate predictions of heterogeneous treatment effects.
arXiv Detail & Related papers (2021-11-28T23:19:12Z) - Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior
Policies [3.855085732184416]
Off-policy evaluation is a key component of reinforcement learning which evaluates a target policy with offline data collected from behavior policies.
This paper discusses how to correctly mix estimators produced by different behavior policies.
Experiments on simulated recommender systems show that our methods are effective in reducing the Mean-Square Error of estimation.
arXiv Detail & Related papers (2020-11-29T12:57:54Z) - Fair Policy Targeting [0.6091702876917281]
One of the major concerns of targeting interventions on individuals in social welfare programs is discrimination.
This paper addresses the question of the design of fair and efficient treatment allocation rules.
arXiv Detail & Related papers (2020-05-25T20:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.