Predictors from causal features do not generalize better to new domains
- URL: http://arxiv.org/abs/2402.09891v1
- Date: Thu, 15 Feb 2024 11:34:38 GMT
- Title: Predictors from causal features do not generalize better to new domains
- Authors: Vivian Y. Nastl and Moritz Hardt
- Abstract summary: We study how well machine learning models trained on causal features generalize across domains.
Our goal is to test the hypothesis that models trained on causal features generalize better across domains.
- Score: 18.95420918106124
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We study how well machine learning models trained on causal features
generalize across domains. We consider 16 prediction tasks on tabular datasets
covering applications in health, employment, education, social benefits, and
politics. Each dataset comes with multiple domains, allowing us to test how
well a model trained in one domain performs in another. For each prediction
task, we select features that have a causal influence on the target of
prediction. Our goal is to test the hypothesis that models trained on causal
features generalize better across domains. Without exception, we find that
predictors using all available features, regardless of causality, have better
in-domain and out-of-domain accuracy than predictors using causal features.
Moreover, even the absolute drop in accuracy from one domain to the other is no
better for causal predictors than for models that use all features. If the goal
is to generalize to new domains, practitioners might as well train the best
possible model on all available features.
Related papers
- Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z) - Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased [0.0]
Imbalanced binary classification problems arise in many fields of study.
It is common to subsample the majority class to create a (more) balanced dataset for model training.
This biases the model's predictions because the model learns from a dataset that does not follow the same data generating process as new data.
arXiv Detail & Related papers (2024-12-17T19:38:29Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Out-of-Domain Robustness via Targeted Augmentations [90.94290420322457]
We study principles for designing data augmentations for out-of-domain generalization.
Motivated by theoretical analysis on a linear setting, we propose targeted augmentations.
We show that targeted augmentations set new states-of-the-art for OOD performance by 3.2-15.2 percentage points.
arXiv Detail & Related papers (2023-02-23T08:59:56Z) - Rationalizing Predictions by Adversarial Information Calibration [65.19407304154177]
We train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction.
We use an adversarial technique to calibrate the information extracted by the two models such that the difference between them is an indicator of the missed or over-selected features.
arXiv Detail & Related papers (2023-01-15T03:13:09Z) - Generalizability Analysis of Graph-based Trajectory Predictor with
Vectorized Representation [29.623692599892365]
Trajectory prediction is one of the essential tasks for autonomous vehicles.
Recent progress in machine learning gave birth to a series of advanced trajectory prediction algorithms.
arXiv Detail & Related papers (2022-08-06T20:19:52Z) - Model Optimization in Imbalanced Regression [2.580765958706854]
Imbalanced domain learning aims to produce accurate models in predicting instances that, though underrepresented, are of utmost importance for the domain.
One of the main reasons for this is the lack of loss functions capable of focusing on minimizing the errors of extreme (rare) values.
Recently, an evaluation metric was introduced: Squared Error Relevance Area (SERA)
This metric posits a bigger emphasis on the errors committed at extreme values while also accounting for the performance in the overall target variable domain.
arXiv Detail & Related papers (2022-06-20T20:23:56Z) - Selective Prediction via Training Dynamics [31.708701583736644]
We show that state-of-the-art selective prediction performance can be attained solely from studying the training dynamics of a model.<n>In particular, we reject data points exhibiting too much disagreement with the final prediction at late stages in training.<n>The proposed rejection mechanism is domain-agnostic (i.e., it works for both discrete and real-valued prediction) and can be flexibly combined with existing selective prediction approaches.
arXiv Detail & Related papers (2022-05-26T17:51:29Z) - Uncertainty Modeling for Out-of-Distribution Generalization [56.957731893992495]
We argue that the feature statistics can be properly manipulated to improve the generalization ability of deep learning models.
Common methods often consider the feature statistics as deterministic values measured from the learned features.
We improve the network generalization ability by modeling the uncertainty of domain shifts with synthesized feature statistics during training.
arXiv Detail & Related papers (2022-02-08T16:09:12Z) - Out-of-Distribution Generalization Analysis via Influence Function [25.80365416547478]
The mismatch between training and target data is one major challenge for machine learning systems.
We introduce Influence Function, a classical tool from robust statistics, into the OOD generalization problem.
We show that the accuracy on test domains and the proposed index together can help us discern whether OOD algorithms are needed and whether a model achieves good OOD generalization.
arXiv Detail & Related papers (2021-01-21T09:59:55Z) - Learning from the Best: Rationalizing Prediction by Adversarial
Information Calibration [39.685626118667074]
We train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction.
We use an adversarial-based technique to calibrate the information extracted by the two models.
For natural language tasks, we propose to use a language-model-based regularizer to encourage the extraction of fluent rationales.
arXiv Detail & Related papers (2020-12-16T11:54:15Z) - Adaptive Risk Minimization: Learning to Adapt to Domain Shift [109.87561509436016]
A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution.
In this work, we consider the problem setting of domain generalization, where the training data are structured into domains and there may be multiple test time shifts.
We introduce the framework of adaptive risk minimization (ARM), in which models are directly optimized for effective adaptation to shift by learning to adapt on the training domains.
arXiv Detail & Related papers (2020-07-06T17:59:30Z) - Estimating Generalization under Distribution Shifts via Domain-Invariant
Representations [75.74928159249225]
We use a set of domain-invariant predictors as a proxy for the unknown, true target labels.
The error of the resulting risk estimate depends on the target risk of the proxy model.
arXiv Detail & Related papers (2020-07-06T17:21:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.