Transformative or Conservative? Conservation laws for ResNets and Transformers
- URL: http://arxiv.org/abs/2506.06194v1
- Date: Fri, 06 Jun 2025 15:53:35 GMT
- Title: Transformative or Conservative? Conservation laws for ResNets and Transformers
- Authors: Sibylle Marcotte, Rémi Gribonval, Gabriel Peyré,
- Abstract summary: This paper bridges the gap by deriving and analyzing conservation laws for modern architectures.<n>We first show that basic building blocks such as ReLU (or linear) shallow networks, with or without convolution, have easily expressed conservation laws.<n>We then introduce the notion of conservation laws that depend only on a subset of parameters.
- Score: 28.287184613608435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While conservation laws in gradient flow training dynamics are well understood for (mostly shallow) ReLU and linear networks, their study remains largely unexplored for more practical architectures. This paper bridges this gap by deriving and analyzing conservation laws for modern architectures, with a focus on convolutional ResNets and Transformer networks. For this, we first show that basic building blocks such as ReLU (or linear) shallow networks, with or without convolution, have easily expressed conservation laws, and no more than the known ones. In the case of a single attention layer, we also completely describe all conservation laws, and we show that residual blocks have the same conservation laws as the same block without a skip connection. We then introduce the notion of conservation laws that depend only on a subset of parameters (corresponding e.g. to a pair of consecutive layers, to a residual block, or to an attention layer). We demonstrate that the characterization of such laws can be reduced to the analysis of the corresponding building block in isolation. Finally, we examine how these newly discovered conservation principles, initially established in the continuous gradient flow regime, persist under discrete optimization dynamics, particularly in the context of Stochastic Gradient Descent (SGD).
Related papers
- Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning [73.18052192964349]
We develop a theoretical framework that explains how discrete symbolic structures can emerge naturally from continuous neural network training dynamics.<n>By lifting neural parameters to a measure space and modeling training as Wasserstein gradient flow, we show that under geometric constraints, the parameter measure $mu_t$ undergoes two concurrent phenomena.
arXiv Detail & Related papers (2025-06-26T22:40:30Z) - A Signed Graph Approach to Understanding and Mitigating Oversmoothing in GNNs [54.62268052283014]
We present a unified theoretical perspective based on the framework of signed graphs.<n>We show that many existing strategies implicitly introduce negative edges that alter message-passing to resist oversmoothing.<n>We propose Structural Balanced Propagation (SBP), a plug-and-play method that assigns signed edges based on either labels or feature similarity.
arXiv Detail & Related papers (2025-02-17T03:25:36Z) - Dynamical freezing in the thermodynamic limit: the strongly driven ensemble [37.31317754926534]
A periodically driven (Floquet) system in the absence of any conservation law heats to a featureless infinite temperature' state.
Here, we find--for a clean and interacting generic spin chain--that this can be prevented by the emergence of it approximate but stable conservation-laws not present in the undriven system.
We show numerically, it in the thermodynamic limit,' that when required by these emergent conservation-laws, the entanglement-entropy density of an infinite subsystem remains zero.
arXiv Detail & Related papers (2024-10-14T19:57:43Z) - Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows [28.287184613608435]
We show that conservation laws for momentum-based dynamics exhibit temporal dependence.
We also observe a "conservation loss" when transitioning from gradient flow to momentum dynamics.
This phenomenon also manifests in non-Euclidean metrics, used e.g. for Nonnegative Matrix Factorization (NMF)
arXiv Detail & Related papers (2024-05-21T15:59:55Z) - Simple Cycle Reservoirs are Universal [0.358439716487063]
Reservoir models form a subclass of recurrent neural networks with fixed non-trainable input and dynamic coupling weights.
We show that they are capable of universal approximation of any unrestricted linear reservoir system.
arXiv Detail & Related papers (2023-08-21T15:35:59Z) - GIFD: A Generative Gradient Inversion Method with Feature Domain
Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy.
Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge.
We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z) - Towards Practical Control of Singular Values of Convolutional Layers [65.25070864775793]
Convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control.
Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties.
We offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity.
arXiv Detail & Related papers (2022-11-24T19:09:44Z) - Log-linear Guardedness and its Implications [116.87322784046926]
Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful.
This work formally defines the notion of log-linear guardedness as the inability of an adversary to predict the concept directly from the representation.
We show that, in the binary case, under certain assumptions, a downstream log-linear model cannot recover the erased concept.
arXiv Detail & Related papers (2022-10-18T17:30:02Z) - Accumulative reservoir construction: Bridging continuously relaxed and
periodically refreshed extended reservoirs [0.0]
We introduce an accumulative reservoir construction that employs a series of partial refreshes of the extended reservoirs.
This provides a unified framework for both continuous (Lindblad) relaxation and a recently introduced periodically refresh approach.
We show how the range of behavior impacts errors and the computational cost, including within tensor networks.
arXiv Detail & Related papers (2022-10-10T17:59:58Z) - Orthogonalizing Convolutional Layers with the Cayley Transform [83.73855414030646]
We propose and evaluate an alternative approach to parameterize convolutional layers that are constrained to be orthogonal.
We show that our method indeed preserves orthogonality to a high degree even for large convolutions.
arXiv Detail & Related papers (2021-04-14T23:54:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.