Out-of-Variable Generalization for Discriminative Models
- URL: http://arxiv.org/abs/2304.07896v3
- Date: Thu, 8 Feb 2024 10:22:42 GMT
- Title: Out-of-Variable Generalization for Discriminative Models
- Authors: Siyuan Guo, Jonas Wildberger, Bernhard Sch\"olkopf
- Abstract summary: In machine learning, the ability of an agent to do well in new environments is a critical aspect of intelligence.
We investigate $textitout-of-variable$ generalization, which pertains to environments with variables that were never jointly observed before.
We propose a method that exhibits non-trivial out-of-variable generalization performance when facing an overlapping, yet distinct, set of causal predictors.
- Score: 13.075802230332298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability of an agent to do well in new environments is a critical aspect
of intelligence. In machine learning, this ability is known as
$\textit{strong}$ or $\textit{out-of-distribution}$ generalization. However,
merely considering differences in data distributions is inadequate for fully
capturing differences between learning environments. In the present paper, we
investigate $\textit{out-of-variable}$ generalization, which pertains to an
agent's generalization capabilities concerning environments with variables that
were never jointly observed before. This skill closely reflects the process of
animate learning: we, too, explore Nature by probing, observing, and measuring
$\textit{subsets}$ of variables at any given time. Mathematically,
$\textit{out-of-variable}$ generalization requires the efficient re-use of past
marginal information, i.e., information over subsets of previously observed
variables. We study this problem, focusing on prediction tasks across
environments that contain overlapping, yet distinct, sets of causes. We show
that after fitting a classifier, the residual distribution in one environment
reveals the partial derivative of the true generating function with respect to
the unobserved causal parent in that environment. We leverage this information
and propose a method that exhibits non-trivial out-of-variable generalization
performance when facing an overlapping, yet distinct, set of causal predictors.
Related papers
- Fairness Hub Technical Briefs: Definition and Detection of Distribution Shift [0.5825410941577593]
Distribution shift is a common situation in machine learning tasks, where the data used for training a model is different from the data the model is applied to in the real world.
This brief focuses on the definition and detection of distribution shifts in educational settings.
arXiv Detail & Related papers (2024-05-23T05:29:36Z) - Leveraging sparse and shared feature activations for disentangled
representation learning [112.22699167017471]
We propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common disentangled representation.
We validate our approach on six real world distribution shift benchmarks, and different data modalities.
arXiv Detail & Related papers (2023-04-17T01:33:24Z) - Unleashing the Power of Graph Data Augmentation on Covariate
Distribution Shift [50.98086766507025]
We propose a simple-yet-effective data augmentation strategy, Adversarial Invariant Augmentation (AIA)
AIA aims to extrapolate and generate new environments, while concurrently preserving the original stable features during the augmentation process.
arXiv Detail & Related papers (2022-11-05T07:55:55Z) - A Relational Intervention Approach for Unsupervised Dynamics
Generalization in Model-Based Reinforcement Learning [113.75991721607174]
We introduce an interventional prediction module to estimate the probability of two estimated $hatz_i, hatz_j$ belonging to the same environment.
We empirically show that $hatZ$ estimated by our method enjoy less redundant information than previous methods.
arXiv Detail & Related papers (2022-06-09T15:01:36Z) - Causal Transportability for Visual Recognition [70.13627281087325]
We show that standard classifiers fail because the association between images and labels is not transportable across settings.
We then show that the causal effect, which severs all sources of confounding, remains invariant across domains.
This motivates us to develop an algorithm to estimate the causal effect for image classification.
arXiv Detail & Related papers (2022-04-26T15:02:11Z) - Learning to Transfer with von Neumann Conditional Divergence [14.926485055255942]
We introduce the recently proposed von Neumann conditional divergence to improve the transferability across multiple domains.
We design novel learning objectives assuming those source tasks are observed either simultaneously or sequentially.
In both scenarios, we obtain favorable performance against state-of-the-art methods in terms of smaller generalization error on new tasks and less catastrophic forgetting on source tasks (in the sequential setup)
arXiv Detail & Related papers (2021-08-07T22:18:23Z) - Mitigating Generation Shifts for Generalized Zero-Shot Learning [52.98182124310114]
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training.
We propose a novel Generation Shifts Mitigating Flow framework for learning unseen data synthesis efficiently and effectively.
Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings.
arXiv Detail & Related papers (2021-07-07T11:43:59Z) - Iterative Feature Matching: Toward Provable Domain Generalization with
Logarithmic Environments [55.24895403089543]
Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments.
We present a new algorithm based on performing iterative feature matching that is guaranteed with high probability to yield a predictor that generalizes after seeing only $O(logd_s)$ environments.
arXiv Detail & Related papers (2021-06-18T04:39:19Z) - What causes the test error? Going beyond bias-variance via ANOVA [21.359033212191218]
Modern machine learning methods are often overparametrized, allowing adaptation to the data at a fine level.
Recent work aimed to understand in greater depth why overparametrization is helpful for generalization.
We propose using the analysis of variance (ANOVA) to decompose the variance in the test error in a symmetric way.
arXiv Detail & Related papers (2020-10-11T05:21:13Z) - Masking schemes for universal marginalisers [1.0412114420493723]
We consider the effect of structure-agnostic and structure-dependent masking schemes when training a universal marginaliser.
We compare networks trained with different masking schemes in terms of their predictive performance and generalisation properties.
arXiv Detail & Related papers (2020-01-16T15:35:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.