Strategic Instrumental Variable Regression: Recovering Causal
Relationships From Strategic Responses
- URL: http://arxiv.org/abs/2107.05762v1
- Date: Mon, 12 Jul 2021 22:12:56 GMT
- Title: Strategic Instrumental Variable Regression: Recovering Causal
Relationships From Strategic Responses
- Authors: Keegan Harris, Daniel Ngo, Logan Stapleton, Hoda Heidari, Zhiwei
Steven Wu
- Abstract summary: We show that we can use strategic responses effectively to recover causal relationships between the observable features and outcomes we wish to predict.
Our work establishes a novel connection between strategic responses to machine learning models and instrumental variable (IV) regression.
- Score: 16.874125120501944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine Learning algorithms often prompt individuals to strategically modify
their observable attributes to receive more favorable predictions. As a result,
the distribution the predictive model is trained on may differ from the one it
operates on in deployment. While such distribution shifts, in general, hinder
accurate predictions, our work identifies a unique opportunity associated with
shifts due to strategic responses: We show that we can use strategic responses
effectively to recover causal relationships between the observable features and
outcomes we wish to predict. More specifically, we study a game-theoretic model
in which a principal deploys a sequence of models to predict an outcome of
interest (e.g., college GPA) for a sequence of strategic agents (e.g., college
applicants). In response, strategic agents invest efforts and modify their
features for better predictions. In such settings, unobserved confounding
variables can influence both an agent's observable features (e.g., high school
records) and outcomes. Therefore, standard regression methods generally produce
biased estimators. In order to address this issue, our work establishes a novel
connection between strategic responses to machine learning models and
instrumental variable (IV) regression, by observing that the sequence of
deployed models can be viewed as an instrument that affects agents' observable
features but does not directly influence their outcomes. Therefore, two-stage
least squares (2SLS) regression can recover the causal relationships between
observable features and outcomes. Beyond causal recovery, we can build on our
2SLS method to address two additional relevant optimization objectives: agent
outcome maximization and predictive risk minimization. Finally, our numerical
simulations on semi-synthetic data show that our methods significantly
outperform OLS regression in causal relationship estimation.
Related papers
- Automating Data Annotation under Strategic Human Agents: Risks and Potential Solutions [10.448052192725168]
This paper investigates the long-term impacts when machine learning models are retrained with model-annotated samples.
We find that agents are increasingly likely to receive positive decisions as the model gets retrained.
We propose a refined retraining process to stabilize the dynamics.
arXiv Detail & Related papers (2024-05-12T13:36:58Z) - Reduced-Rank Multi-objective Policy Learning and Optimization [57.978477569678844]
In practice, causal researchers do not have a single outcome in mind a priori.
In government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty.
We present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning.
arXiv Detail & Related papers (2024-04-29T08:16:30Z) - Learning from Aggregate responses: Instance Level versus Bag Level Loss
Functions [23.32422115080128]
In many practical applications the training data is aggregated before being shared with the learner, in order to protect privacy of users' sensitive responses.
We study two natural loss functions for learning from aggregate responses: bag-level loss and the instance-level loss.
We propose a mechanism for differentially private learning from aggregate responses and derive the optimal bag size in terms of prediction risk-privacy trade-off.
arXiv Detail & Related papers (2024-01-20T02:14:11Z) - Linked shrinkage to improve estimation of interaction effects in
regression models [0.0]
We develop an estimator that adapts well to two-way interaction terms in a regression model.
We evaluate the potential of the model for inference, which is notoriously hard for selection strategies.
Our models can be very competitive to a more advanced machine learner, like random forest, even for fairly large sample sizes.
arXiv Detail & Related papers (2023-09-25T10:03:39Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - When No-Rejection Learning is Consistent for Regression with Rejection [11.244583592648443]
We study a no-reject learning strategy that uses all the data to learn the prediction.
This paper investigates a no-reject learning strategy that uses all the data to learn the prediction.
arXiv Detail & Related papers (2023-07-06T11:43:22Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - Towards Robust and Adaptive Motion Forecasting: A Causal Representation
Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables.
We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph.
Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - A Locally Adaptive Interpretable Regression [7.4267694612331905]
Linear regression is one of the most interpretable prediction models.
In this work, we introduce a locally adaptive interpretable regression (LoAIR)
Our model achieves comparable or better predictive performance than the other state-of-the-art baselines.
arXiv Detail & Related papers (2020-05-07T09:26:14Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.