Related papers: The Implicit Bias of Heterogeneity towards Invariance: A Study of Multi-Environment Matrix Sensing

The Implicit Bias of Heterogeneity towards Invariance: A Study of Multi-Environment Matrix Sensing

URL: http://arxiv.org/abs/2403.01420v3
Date: Tue, 19 Nov 2024 06:10:32 GMT
Title: The Implicit Bias of Heterogeneity towards Invariance: A Study of Multi-Environment Matrix Sensing
Authors: Yang Xu, Yihong Gu, Cong Fang,
Abstract summary: This paper studies the implicit bias of Gradient Descent (SGD) over heterogeneous data and shows that the implicit bias drives the model learning towards an invariant solution. Specifically, we theoretically investigate the multi-environment low-rank matrix sensing problem where in each environment, the signal comprises (i) a lower-rank invariant part shared across all environments; and (ii) a significantly varying environment-dependent spurious component. The key insight is, through simply employing the large step size large-batch SGD sequentially in each environment without any explicit regularization, the oscillation caused by heterogeneity can provably prevent model learning spurious signals.
Score: 9.551225697705199
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Models are expected to engage in invariance learning, which involves distinguishing the core relations that remain consistent across varying environments to ensure the predictions are safe, robust and fair. While existing works consider specific algorithms to realize invariance learning, we show that model has the potential to learn invariance through standard training procedures. In other words, this paper studies the implicit bias of Stochastic Gradient Descent (SGD) over heterogeneous data and shows that the implicit bias drives the model learning towards an invariant solution. We call the phenomenon the implicit invariance learning. Specifically, we theoretically investigate the multi-environment low-rank matrix sensing problem where in each environment, the signal comprises (i) a lower-rank invariant part shared across all environments; and (ii) a significantly varying environment-dependent spurious component. The key insight is, through simply employing the large step size large-batch SGD sequentially in each environment without any explicit regularization, the oscillation caused by heterogeneity can provably prevent model learning spurious signals. The model reaches the invariant solution after certain iterations. In contrast, model learned using pooled SGD over all data would simultaneously learn both the invariant and spurious signals. Overall, we unveil another implicit bias that is a result of the symbiosis between the heterogeneity of data and modern algorithms, which is, to the best of our knowledge, first in the literature.

Related papers

Raising the Bar in Graph OOD Generalization: Invariant Learning Beyond Explicit Environment Modeling [58.15601237755505]
Real-world graph data often exhibit diverse and shifting environments that traditional models fail to generalize across. We propose a novel method termed Multi-Prototype Hyperspherical Invariant Learning (MPHIL) MPHIL achieves state-of-the-art performance, significantly outperforming existing methods across graph data from various domains and with different distribution shifts.
arXiv Detail & Related papers (2025-02-15T07:40:14Z)
Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data. We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z)
Counterfactual Fairness through Transforming Data Orthogonal to Bias [7.109458605736819]
We propose a novel data pre-processing algorithm, Orthogonal to Bias (OB) OB is designed to eliminate the influence of a group of continuous sensitive variables, thus promoting counterfactual fairness in machine learning applications. OB is model-agnostic, making it applicable to a wide range of machine learning models and tasks.
arXiv Detail & Related papers (2024-03-26T16:40:08Z)
Effective Causal Discovery under Identifiable Heteroscedastic Noise Model [45.98718860540588]
Causal DAG learning has recently achieved promising performance in terms of both accuracy and efficiency. We propose a novel formulation for DAG learning that accounts for the variation in noise variance across variables and observations. We then propose an effective two-phase iterative DAG learning algorithm to address the increasing optimization difficulties.
arXiv Detail & Related papers (2023-12-20T08:51:58Z)
Unleashing the Power of Graph Data Augmentation on Covariate Distribution Shift [50.98086766507025]
We propose a simple-yet-effective data augmentation strategy, Adversarial Invariant Augmentation (AIA) AIA aims to extrapolate and generate new environments, while concurrently preserving the original stable features during the augmentation process.
arXiv Detail & Related papers (2022-11-05T07:55:55Z)
Equivariance and Invariance Inductive Bias for Learning from Insufficient Data [65.42329520528223]
We show why insufficient data renders the model more easily biased to the limited training environments that are usually different from testing. We propose a class-wise invariant risk minimization (IRM) that efficiently tackles the challenge of missing environmental annotation in conventional IRM.
arXiv Detail & Related papers (2022-07-25T15:26:19Z)
On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data. Invariance measures consistency of model predictions on transformations of the data. From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z)
Differentiable Invariant Causal Discovery [106.87950048845308]
Learning causal structure from observational data is a fundamental challenge in machine learning. This paper proposes Differentiable Invariant Causal Discovery (DICD) to avoid learning spurious edges and wrong causal directions. Extensive experiments on synthetic and real-world datasets verify that DICD outperforms state-of-the-art causal discovery methods up to 36% in SHD.
arXiv Detail & Related papers (2022-05-31T09:29:07Z)
Learning Conditional Invariance through Cycle Consistency [60.85059977904014]
We propose a novel approach to identify meaningful and independent factors of variation in a dataset. Our method involves two separate latent subspaces for the target property and the remaining input information. We demonstrate on synthetic and molecular data that our approach identifies more meaningful factors which lead to sparser and more interpretable models.
arXiv Detail & Related papers (2021-11-25T17:33:12Z)
Kernelized Heterogeneous Risk Minimization [25.5458915855661]
We propose a Kernelized Heterogeneous Risk Minimization (KerHRM) algorithm, which achieves both the latent exploration and invariant learning in kernel space. We theoretically justify our algorithm and empirically validate the effectiveness of our algorithm with extensive experiments.
arXiv Detail & Related papers (2021-10-24T12:26:50Z)
Heterogeneous Risk Minimization [25.5458915855661]
Invariant learning methods for out-of-distribution generalization have been proposed by leveraging multiple training environments to find invariant relationships. Modern datasets are assembled by merging data from multiple sources without explicit source labels. We propose Heterogeneous Risk Minimization (HRM) framework to achieve joint learning of latent heterogeneity among the data and invariant relationship.
arXiv Detail & Related papers (2021-05-09T02:51:36Z)
Nonlinear Invariant Risk Minimization: A Causal Approach [5.63479133344366]
We propose a learning paradigm that enables out-of-distribution generalization in the nonlinear setting. We show identifiability of the data representation up to very simple transformations. Extensive experiments on both synthetic and real-world datasets show that our approach significantly outperforms a variety of baseline methods.
arXiv Detail & Related papers (2021-02-24T15:38:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.