Related papers: The geometry of invariant learning: an information-theoretic analysis of data augmentation and generalization

The geometry of invariant learning: an information-theoretic analysis of data augmentation and generalization

URL: http://arxiv.org/abs/2602.14423v1
Date: Mon, 16 Feb 2026 03:18:39 GMT
Title: The geometry of invariant learning: an information-theoretic analysis of data augmentation and generalization
Authors: Abdelali Bouyahia, Frédéric LeBlanc, Mario Marchand,
Abstract summary: We propose an information-theoretic framework that systematically accounts for the effect of augmentation on generalization and invariance learning.<n>Our approach builds upon mutual information-based bounds, which relate the generalization gap to the amount of information a learning algorithm retains about its training data.<n>Under mild sub-Gaussian assumptions on the loss function and the augmentation process, we derive a new generalization bound that decompose the expected generalization gap into three interpretable terms.
Score: 2.496574213989531
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data augmentation is one of the most widely used techniques to improve generalization in modern machine learning, often justified by its ability to promote invariance to label-irrelevant transformations. However, its theoretical role remains only partially understood. In this work, we propose an information-theoretic framework that systematically accounts for the effect of augmentation on generalization and invariance learning. Our approach builds upon mutual information-based bounds, which relate the generalization gap to the amount of information a learning algorithm retains about its training data. We extend this framework by modeling the augmented distribution as a composition of the original data distribution with a distribution over transformations, which naturally induces an orbit-averaged loss function. Under mild sub-Gaussian assumptions on the loss function and the augmentation process, we derive a new generalization bound that decompose the expected generalization gap into three interpretable terms: (1) a distributional divergence between the original and augmented data, (2) a stability term measuring the algorithm dependence on training data, and (3) a sensitivity term capturing the effect of augmentation variability. To connect our bounds to the geometry of the augmentation group, we introduce the notion of group diameter, defined as the maximal perturbation that augmentations can induce in the input space. The group diameter provides a unified control parameter that bounds all three terms and highlights an intrinsic trade-off: small diameters preserve data fidelity but offer limited regularization, while large diameters enhance stability at the cost of increased bias and sensitivity. We validate our theoretical bounds with numerical experiments, demonstrating that it reliably tracks and predicts the behavior of the true generalization gap.

Related papers

Generalization Below the Edge of Stability: The Role of Data Geometry [60.147710896851045]
We show how data geometry controls generalization in ReLU networks trained below the edge of stability.<n>For data distributions supported on a mixture of low-dimensional balls, we derive generalization bounds that provably adapt to the intrinsic dimension.<n>Our results consolidate disparate empirical findings that have appeared in the literature.
arXiv Detail & Related papers (2025-10-20T21:40:36Z)
Distribution-dependent Generalization Bounds for Tuning Linear Regression Across Tasks [24.2043855572415]
We find distribution-dependent bounds on the generalization error for the validation loss when tuning the L1 and L2 coefficients.<n>We extend our results to a generalization of ridge regression, where we achieve tighter bounds that take into account the mean of the ground truth distribution.
arXiv Detail & Related papers (2025-07-07T15:08:45Z)
Understanding Learning Invariance in Deep Linear Networks [15.335716956682203]
Equivariant and invariant machine learning models exploit symmetries and structural patterns in data to improve sample efficiency.<n>We provide a theoretical comparison of three approaches for achieving invariance: data augmentation, regularization, and hard-wiring.<n>We show that the critical points of the optimization problems for hard-wiring and data augmentation are identical, consisting solely of saddles and the global optimum.
arXiv Detail & Related papers (2025-06-16T17:24:07Z)
Partial Transportability for Domain Generalization [56.37032680901525]
Building on the theory of partial identification and transportability, this paper introduces new results for bounding the value of a functional of the target distribution.<n>Our contribution is to provide the first general estimation technique for transportability problems.<n>We propose a gradient-based optimization scheme for making scalable inferences in practice.
arXiv Detail & Related papers (2025-03-30T22:06:37Z)
A Mathematics Framework of Artificial Shifted Population Risk and Its Further Understanding Related to Consistency Regularization [7.944280447232545]
This paper introduces a more comprehensive mathematical framework for data augmentation.<n>We establish that the expected risk of the shifted population is the sum of the original population risk and a gap term.<n>The paper also provides a theoretical understanding of this gap, highlighting its negative effects on the early stages of training.
arXiv Detail & Related papers (2025-02-15T08:26:49Z)
Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data. We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z)
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator. This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z)
Data-dependent Generalization Bounds via Variable-Size Compressibility [16.2444595840653]
We establish novel data-dependent upper bounds on the generalization error through the lens of a "variable-size compressibility" framework. In this framework, the generalization error of an algorithm is linked to a variable-size 'compression rate' of its input data. Our new generalization bounds that we establish are tail bounds, tail bounds on the expectation, and in-expectations bounds.
arXiv Detail & Related papers (2023-03-09T16:17:45Z)
Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks. We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z)
Robustness Implies Generalization via Data-Dependent Generalization Bounds [24.413499775513145]
This paper proves that robustness implies generalization via data-dependent generalization bounds. We present several examples, including ones for lasso and deep learning, in which our bounds are provably preferable.
arXiv Detail & Related papers (2022-06-27T17:58:06Z)
Which Invariance Should We Transfer? A Causal Minimax Learning Approach [18.71316951734806]
We present a comprehensive minimax analysis from a causal perspective. We propose an efficient algorithm to search for the subset with minimal worst-case risk. The effectiveness and efficiency of our methods are demonstrated on synthetic data and the diagnosis of Alzheimer's disease.
arXiv Detail & Related papers (2021-07-05T09:07:29Z)
On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation. We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.