Softly Induced Functional Simplicity: Implications for Neural Network Generalisation, Robustness, and Distillation
- URL: http://arxiv.org/abs/2601.06584v2
- Date: Thu, 15 Jan 2026 10:40:18 GMT
- Title: Softly Induced Functional Simplicity: Implications for Neural Network Generalisation, Robustness, and Distillation
- Authors: Maciej Glowacki,
- Abstract summary: Learning robust and generalisable abstractions from high-dimensional input data is a central challenge in machine learning.<n>We show that a soft symmetry respecting inductive bias creates approximate degeneracies in the loss, which we identify as pseudo-Goldstone modes.<n>Our results demonstrate that solutions of lower complexity give rise to abstractions that are more generalisable, robust, and efficiently distillable.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning robust and generalisable abstractions from high-dimensional input data is a central challenge in machine learning and its applications to high-energy physics (HEP). Solutions of lower functional complexity are known to produce abstractions that generalise more effectively and are more robust to input perturbations. In complex hypothesis spaces, inductive biases make such solutions learnable by shaping the loss geometry during optimisation. In a HEP classification task, we show that a soft symmetry respecting inductive bias creates approximate degeneracies in the loss, which we identify as pseudo-Goldstone modes. We quantify functional complexity using metrics derived from first principles Hessian analysis and via compressibility. Our results demonstrate that solutions of lower complexity give rise to abstractions that are more generalisable, robust, and efficiently distillable.
Related papers
- Unlocking Symbol-Level Precoding Efficiency Through Tensor Equivariant Neural Network [84.22115118596741]
We propose an end-to-end deep learning (DL) framework with low inference complexity for symbol-level precoding.<n>We show that the proposed framework captures substantial performance gains of optimal SLP, while achieving an approximately 80-times speedup over conventional methods.
arXiv Detail & Related papers (2025-10-02T15:15:50Z) - Identifying Causal Direction via Variational Bayesian Compression [6.928582707713723]
A key principle is the algorithmic Markov condition, which postulates that the joint distribution, when factorized according to the causal direction, yields a more succinct codelength.<n>Previous approaches approximate these codelengths by relying on simple functions or Gaussian processes (GPs) with easily evaluable complexity.<n>We propose leveraging the variational Bayesian learning of neural networks as an interpretation of the codelengths.
arXiv Detail & Related papers (2025-05-12T12:40:15Z) - Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers [10.206921909332006]
This study investigates the internal mechanisms underlying Transformers' behavior in compositional tasks.<n>We find that complexity control strategies influence whether the model learns primitive-level rules that generalize out-of-distribution (reasoning-based solutions) or relies solely on memorized mappings (memory-based solutions)
arXiv Detail & Related papers (2025-01-15T02:54:52Z) - Learning signals defined on graphs with optimal transport and Gaussian process regression [1.1062090350704616]
In computational physics, machine learning has emerged as a powerful complementary tool to explore efficiently candidate designs in engineering studies.<n>We propose an innovative strategy for Gaussian process regression where inputs are large and sparse graphs with continuous node attributes and outputs are signals defined on the nodes of the associated inputs.<n>In addition to enabling signal prediction, the main point of our proposal is to come with confidence intervals on node values, which is crucial for uncertainty and active learning.
arXiv Detail & Related papers (2024-10-21T07:39:44Z) - Generalization Error Guaranteed Auto-Encoder-Based Nonlinear Model
Reduction for Operator Learning [12.124206935054389]
In this paper, we utilize low-dimensional nonlinear structures in model reduction by investigating Auto-Encoder-based Neural Network (AENet)
Our numerical experiments validate the ability of AENet to accurately learn the solution operator of nonlinear partial differential equations.
Our theoretical framework shows that the sample complexity of training AENet is intricately tied to the intrinsic dimension of the modeled process.
arXiv Detail & Related papers (2024-01-19T05:01:43Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Physics-Guided Problem Decomposition for Scaling Deep Learning of
High-dimensional Eigen-Solvers: The Case of Schr\"{o}dinger's Equation [8.80823317679047]
Deep neural networks (NNs) have been proposed as a viable alternative to traditional simulation-driven approaches for solving high-dimensional eigenvalue equations.
In this paper, we use physics knowledge to decompose the complex regression task of predicting the high-dimensional eigenvectors into simpler sub-tasks.
We demonstrate the efficacy of such physics-guided problem decomposition for the case of the Schr"odinger's Equation in Quantum Mechanics.
arXiv Detail & Related papers (2022-02-12T05:59:08Z) - Generalization of Neural Combinatorial Solvers Through the Lens of
Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features.
We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features.
Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound.
Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.