Related papers: Understanding Learning Invariance in Deep Linear Networks

Understanding Learning Invariance in Deep Linear Networks

URL: http://arxiv.org/abs/2506.13714v1
Date: Mon, 16 Jun 2025 17:24:07 GMT
Title: Understanding Learning Invariance in Deep Linear Networks
Authors: Hao Duan, Guido Montúfar,
Abstract summary: Equivariant and invariant machine learning models exploit symmetries and structural patterns in data to improve sample efficiency.<n>We provide a theoretical comparison of three approaches for achieving invariance: data augmentation, regularization, and hard-wiring.<n>We show that the critical points of the optimization problems for hard-wiring and data augmentation are identical, consisting solely of saddles and the global optimum.
Score: 15.335716956682203
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Equivariant and invariant machine learning models exploit symmetries and structural patterns in data to improve sample efficiency. While empirical studies suggest that data-driven methods such as regularization and data augmentation can perform comparably to explicitly invariant models, theoretical insights remain scarce. In this paper, we provide a theoretical comparison of three approaches for achieving invariance: data augmentation, regularization, and hard-wiring. We focus on mean squared error regression with deep linear networks, which parametrize rank-bounded linear maps and can be hard-wired to be invariant to specific group actions. We show that the critical points of the optimization problems for hard-wiring and data augmentation are identical, consisting solely of saddles and the global optimum. By contrast, regularization introduces additional critical points, though they remain saddles except for the global optimum. Moreover, we demonstrate that the regularization path is continuous and converges to the hard-wired solution.

Related papers

Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data. We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z)
Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization [10.009748368458409]
We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. Our method enables fully differentiable approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep learning.
arXiv Detail & Related papers (2023-07-07T13:06:12Z)
Understanding Pathologies of Deep Heteroskedastic Regression [25.509884677111344]
Heteroskedastic models predict both mean and residual noise for each data point. At one extreme, these models fit all training data perfectly, eliminating residual noise entirely. At the other, they overfit the residual noise while predicting a constant, uninformative mean. We observe a lack of middle ground, suggesting a phase transition dependent on model regularization strength.
arXiv Detail & Related papers (2023-06-29T06:31:27Z)
On the Implicit Geometry of Cross-Entropy Parameterizations for Label-Imbalanced Data [26.310275682709776]
Various logit-adjusted parameterizations of the cross-entropy (CE) loss have been proposed as alternatives to weighted CE large models on labelimbalanced data. We show that logit-adjusted parameterizations can be appropriately tuned to learn to learn irrespective of the minority imbalance ratio.
arXiv Detail & Related papers (2023-03-14T03:04:37Z)
Data Augmentation vs. Equivariant Networks: A Theory of Generalization on Dynamics Forecasting [24.363954435050264]
Exploiting symmetry in dynamical systems is a powerful way to improve the generalization of deep learning. Data augmentation and equivariant networks are two major approaches to injecting symmetry into learning. We derive the generalization bounds for data augmentation and equivariant networks, characterizing their effect on learning in a unified framework.
arXiv Detail & Related papers (2022-06-19T17:00:12Z)
Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes. We propose a metric that quantifies the ability of a graph to mix the current gradients. Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z)
Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference. We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z)
Squared $\ell_2$ Norm as Consistency Loss for Leveraging Augmented Data to Learn Robust and Invariant Representations [76.85274970052762]
Regularizing distance between embeddings/representations of original samples and augmented counterparts is a popular technique for improving robustness of neural networks. In this paper, we explore these various regularization choices, seeking to provide a general understanding of how we should regularize the embeddings. We show that the generic approach we identified (squared $ell$ regularized augmentation) outperforms several recent methods, which are each specially designed for one task.
arXiv Detail & Related papers (2020-11-25T22:40:09Z)
Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization [55.278153228758434]
Real-world datasets are heteroskedastic and imbalanced. Addressing heteroskedasticity and imbalance simultaneously is under-explored. We propose a data-dependent regularization technique for heteroskedastic datasets.
arXiv Detail & Related papers (2020-06-29T01:09:50Z)
On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation. We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.