Learning Robust Models Using The Principle of Independent Causal
Mechanisms
- URL: http://arxiv.org/abs/2010.07167v2
- Date: Mon, 8 Feb 2021 15:39:13 GMT
- Title: Learning Robust Models Using The Principle of Independent Causal
Mechanisms
- Authors: Jens M\"uller, Robert Schmier, Lynton Ardizzone, Carsten Rother and
Ullrich K\"othe
- Abstract summary: We propose a new gradient-based learning framework whose objective function is derived from the ICM principle.
We show theoretically and experimentally that neural networks trained in this framework focus on relations remaining invariant across environments.
- Score: 26.79262903241044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standard supervised learning breaks down under data distribution shift.
However, the principle of independent causal mechanisms (ICM, Peters et al.
(2017)) can turn this weakness into an opportunity: one can take advantage of
distribution shift between different environments during training in order to
obtain more robust models. We propose a new gradient-based learning framework
whose objective function is derived from the ICM principle. We show
theoretically and experimentally that neural networks trained in this framework
focus on relations remaining invariant across environments and ignore unstable
ones. Moreover, we prove that the recovered stable relations correspond to the
true causal mechanisms under certain conditions. In both regression and
classification, the resulting models generalize well to unseen scenarios where
traditionally trained models fail.
Related papers
- SAMBO-RL: Shifts-aware Model-based Offline Reinforcement Learning [9.88109749688605]
Model-based Offline Reinforcement Learning trains policies based on offline datasets and model dynamics.
This paper disentangles the problem into two key components: model bias and policy shift.
We introduce Shifts-aware Model-based Offline Reinforcement Learning (SAMBO-RL)
arXiv Detail & Related papers (2024-08-23T04:25:09Z) - Sequential Representation Learning via Static-Dynamic Conditional Disentanglement [58.19137637859017]
This paper explores self-supervised disentangled representation learning within sequential data, focusing on separating time-independent and time-varying factors in videos.
We propose a new model that breaks the usual independence assumption between those factors by explicitly accounting for the causal relationship between the static/dynamic variables.
Experiments show that the proposed approach outperforms previous complex state-of-the-art techniques in scenarios where the dynamics of a scene are influenced by its content.
arXiv Detail & Related papers (2024-08-10T17:04:39Z) - Revisiting Spurious Correlation in Domain Generalization [12.745076668687748]
We build a structural causal model (SCM) to describe the causality within data generation process.
We further conduct a thorough analysis of the mechanisms underlying spurious correlation.
In this regard, we propose to control confounding bias in OOD generalization by introducing a propensity score weighted estimator.
arXiv Detail & Related papers (2024-06-17T13:22:00Z) - Compete and Compose: Learning Independent Mechanisms for Modular World Models [57.94106862271727]
We present COMET, a modular world model which leverages reusable, independent mechanisms across different environments.
COMET is trained on multiple environments with varying dynamics via a two-step process: competition and composition.
We show that COMET is able to adapt to new environments with varying numbers of objects with improved sample efficiency compared to more conventional finetuning approaches.
arXiv Detail & Related papers (2024-04-23T15:03:37Z) - Learning Optimal Features via Partial Invariance [18.552839725370383]
Invariant Risk Minimization (IRM) is a popular framework that aims to learn robust models from multiple environments.
We show that IRM can over-constrain the predictor and to remedy this, we propose a relaxation via $textitpartial invariance$.
Several experiments, conducted both in linear settings as well as with deep neural networks on tasks over both language and image data, allow us to verify our conclusions.
arXiv Detail & Related papers (2023-01-28T02:48:14Z) - How robust are pre-trained models to distribution shift? [82.08946007821184]
We show how spurious correlations affect the performance of popular self-supervised learning (SSL) and auto-encoder based models (AE)
We develop a novel evaluation scheme with the linear head trained on out-of-distribution (OOD) data, to isolate the performance of the pre-trained models from a potential bias of the linear head used for evaluation.
arXiv Detail & Related papers (2022-06-17T16:18:28Z) - Causal Discovery in Heterogeneous Environments Under the Sparse
Mechanism Shift Hypothesis [7.895866278697778]
Machine learning approaches commonly rely on the assumption of independent and identically distributed (i.i.d.) data.
In reality, this assumption is almost always violated due to distribution shifts between environments.
We propose the Mechanism Shift Score (MSS), a score-based approach amenable to various empirical estimators.
arXiv Detail & Related papers (2022-06-04T15:39:30Z) - Bias-inducing geometries: an exactly solvable data model with fairness
implications [13.690313475721094]
We introduce an exactly solvable high-dimensional model of data imbalance.
We analytically unpack the typical properties of learning models trained in this synthetic framework.
We obtain exact predictions for the observables that are commonly employed for fairness assessment.
arXiv Detail & Related papers (2022-05-31T16:27:57Z) - Discovering Latent Causal Variables via Mechanism Sparsity: A New
Principle for Nonlinear ICA [81.4991350761909]
Independent component analysis (ICA) refers to an ensemble of methods which formalize this goal and provide estimation procedure for practical application.
We show that the latent variables can be recovered up to a permutation if one regularizes the latent mechanisms to be sparse.
arXiv Detail & Related papers (2021-07-21T14:22:14Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.