Related papers: Implicit bias as a Gauge correction: Theory and Inverse Design

Implicit bias as a Gauge correction: Theory and Inverse Design

URL: http://arxiv.org/abs/2601.06597v1
Date: Sat, 10 Jan 2026 15:33:09 GMT
Title: Implicit bias as a Gauge correction: Theory and Inverse Design
Authors: Nicola Aladrah, Emanuele Ballarin, Matteo Biagetti, Alessio Ansuini, Alberto d'Onofrio, Fabio Anselmi,
Abstract summary: A central problem in machine learning theory is to characterize how learning dynamics select particular solutions compatible with the training objective.<n>We identify a general mechanism, in terms of an explicit correction of the learning dynamics, for the emergence of implicit biases.<n>We compute the resulting induced bias for a range of dynamics, showing how several well known results fit into a single unified framework.
Score: 2.9379512315137117
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A central problem in machine learning theory is to characterize how learning dynamics select particular solutions among the many compatible with the training objective, a phenomenon, called implicit bias, which remains only partially characterized. In the present work, we identify a general mechanism, in terms of an explicit geometric correction of the learning dynamics, for the emergence of implicit biases, arising from the interaction between continuous symmetries in the model's parametrization and stochasticity in the optimization process. Our viewpoint is constructive in two complementary directions: given model symmetries, one can derive the implicit bias they induce; conversely, one can inverse-design a wide class of different implicit biases by computing specific redundant parameterizations. More precisely, we show that, when the dynamics is expressed in the quotient space obtained by factoring out the symmetry group of the parameterization, the resulting stochastic differential equation gains a closed form geometric correction in the stationary distribution of the optimizer dynamics favoring orbits with small local volume. We compute the resulting symmetry induced bias for a range of architectures, showing how several well known results fit into a single unified framework. The approach also provides a practical methodology for deriving implicit biases in new settings, and it yields concrete, testable predictions that we confirm by numerical simulations on toy models trained on synthetic data, leaving more complex scenarios for future work. Finally, we test the implicit bias inverse-design procedure in notable cases, including biases toward sparsity in linear features or in spectral properties of the model parameters.

Related papers

Loss Landscape Geometry and the Learning of Symmetries: Or, What Influence Functions Reveal About Robust Generalization [0.14201057456467273]
We study how neural emulators internalize physical symmetries by introducing an influence-based diagnostic.<n>This quantity probes the local geometry of the learned loss landscape.<n>We show that orbit-wise gradient coherence provides the mechanism for learning to generalize over symmetry transformations.
arXiv Detail & Related papers (2026-01-28T02:14:01Z)
From Coefficients to Directions: Rethinking Model Merging with Directional Alignment [66.99062575537555]
We introduce a unified geometric framework, emphMerging with Directional Alignment (method), which aligns directional structures consistently in both the parameter and feature spaces.<n>Our analysis shows that directional alignment improves structural coherence, and extensive experiments across benchmarks, model scales, and task configurations further validate the effectiveness of our approach.
arXiv Detail & Related papers (2025-11-29T08:40:58Z)
SEAL - A Symmetry EncourAging Loss for High Energy Physics [0.005211875900848231]
Building machine learning models that explicitly respect symmetries can be difficult due to the dedicated components required.<n>We introduce soft constraints that allow the model to decide the importance of added symmetries during the learning process instead of enforcing exact symmetries.
arXiv Detail & Related papers (2025-11-03T19:00:13Z)
Equivariant Flow Matching for Symmetry-Breaking Bifurcation Problems [2.720699926154399]
We propose a generative framework based on flow matching to model the full probability distribution over bifurcation outcomes.<n>We validate our approach on a range of systems, from toy models to complex physical problems such as buckling beams and the Allen-Cahn equation.
arXiv Detail & Related papers (2025-09-03T14:18:05Z)
Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z)
Relative Representations: Topological and Geometric Perspectives [50.85040046976025]
Relative representations are an established approach to zero-shot model stitching.<n>We introduce a normalization procedure in the relative transformation, resulting in invariance to non-isotropic rescalings and permutations.<n>Second, we propose to deploy topological densification when fine-tuning relative representations, a topological regularization loss encouraging clustering within classes.
arXiv Detail & Related papers (2024-09-17T08:09:22Z)
Scaling and renormalization in high-dimensional regression [72.59731158970894]
We present a unifying perspective on recent results on ridge regression.<n>We use the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning.<n>Our results extend and provide a unifying perspective on earlier models of scaling laws.
arXiv Detail & Related papers (2024-05-01T15:59:00Z)
On the Implicit Geometry of Cross-Entropy Parameterizations for Label-Imbalanced Data [26.310275682709776]
Various logit-adjusted parameterizations of the cross-entropy (CE) loss have been proposed as alternatives to weighted CE large models on labelimbalanced data. We show that logit-adjusted parameterizations can be appropriately tuned to learn to learn irrespective of the minority imbalance ratio.
arXiv Detail & Related papers (2023-03-14T03:04:37Z)
On the Influence of Enforcing Model Identifiability on Learning dynamics of Gaussian Mixture Models [14.759688428864159]
We propose a technique for extracting submodels from singular models. Our method enforces model identifiability during training. We show how the method can be applied to more complex models like deep neural networks.
arXiv Detail & Related papers (2022-06-17T07:50:22Z)
Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds [18.60761407945024]
In learning dynamics, singularities can act as attractors on the learning trajectory and, therefore, negatively influence the convergence speed of models. We propose a general approach to circumvent the problem arising from singularities by using stratifolds. We empirically show that using (natural) gradient descent on the smooth manifold approximation instead of the singular space allows us to avoid the attractor behavior and therefore improve the convergence speed in learning.
arXiv Detail & Related papers (2021-12-07T14:42:45Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets. We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator. We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.