Structure Is Not Enough: Leveraging Behavior for Neural Network Weight Reconstruction
- URL: http://arxiv.org/abs/2503.17138v1
- Date: Fri, 21 Mar 2025 13:39:04 GMT
- Title: Structure Is Not Enough: Leveraging Behavior for Neural Network Weight Reconstruction
- Authors: Léo Meynent, Ivan Melev, Konstantin Schürholt, Göran Kauermann, Damian Borth,
- Abstract summary: One approach to leverage NN weights involves training autoencoders (AEs) using contrastive and reconstruction losses.<n>AEs reconstruct NN models with deteriorated performance compared to the original ones, limiting their usability with regard to model weight generation.<n>We show a strong synergy between structural and behavioral signals, leading to increased performance in all downstream tasks evaluated.
- Score: 6.926413609535758
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The weights of neural networks (NNs) have recently gained prominence as a new data modality in machine learning, with applications ranging from accuracy and hyperparameter prediction to representation learning or weight generation. One approach to leverage NN weights involves training autoencoders (AEs), using contrastive and reconstruction losses. This allows such models to be applied to a wide variety of downstream tasks, and they demonstrate strong predictive performance and low reconstruction error. However, despite the low reconstruction error, these AEs reconstruct NN models with deteriorated performance compared to the original ones, limiting their usability with regard to model weight generation. In this paper, we identify a limitation of weight-space AEs, specifically highlighting that a structural loss, that uses the Euclidean distance between original and reconstructed weights, fails to capture some features critical for reconstructing high-performing models. We analyze the addition of a behavioral loss for training AEs in weight space, where we compare the output of the reconstructed model with that of the original one, given some common input. We show a strong synergy between structural and behavioral signals, leading to increased performance in all downstream tasks evaluated, in particular NN weights reconstruction and generation.
Related papers
- An Overview of Low-Rank Structures in the Training and Adaptation of Large Models [52.67110072923365]
Recent research has uncovered a widespread phenomenon in deep networks: the emergence of low-rank structures.
These implicit low-dimensional patterns provide valuable insights for improving the efficiency of training and fine-tuning large-scale models.
We present a comprehensive review of advances in exploiting low-rank structures for deep learning and shed light on their mathematical foundations.
arXiv Detail & Related papers (2025-03-25T17:26:09Z) - Generalized Factor Neural Network Model for High-dimensional Regression [50.554377879576066]
We tackle the challenges of modeling high-dimensional data sets with latent low-dimensional structures hidden within complex, non-linear, and noisy relationships.<n>Our approach enables a seamless integration of concepts from non-parametric regression, factor models, and neural networks for high-dimensional regression.
arXiv Detail & Related papers (2025-02-16T23:13:55Z) - Symplectic Autoencoders for Model Reduction of Hamiltonian Systems [0.0]
It is crucial to preserve the symplectic structure associated with the system in order to ensure long-term numerical stability.
We propose a new neural network architecture in the spirit of autoencoders, which are established tools for dimension reduction.
In order to train the network, a non-standard gradient descent approach is applied.
arXiv Detail & Related papers (2023-12-15T18:20:25Z) - Conditional deep generative models as surrogates for spatial field
solution reconstruction with quantified uncertainty in Structural Health
Monitoring applications [0.0]
In problems related to Structural Health Monitoring (SHM), models capable of both handling high-dimensional data and quantifying uncertainty are required.
We propose a conditional deep generative model as a surrogate aimed at such applications and high-dimensional structural simulations in general.
The model is able to achieve high reconstruction accuracy compared to the reference Finite Element (FE) solutions, while at the same time successfully encoding the load uncertainty.
arXiv Detail & Related papers (2023-02-14T20:13:24Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - The learning phases in NN: From Fitting the Majority to Fitting a Few [2.5991265608180396]
We analyze a layer's reconstruction ability of the input and prediction performance based on the evolution of parameters during training.
We also assess the behavior using common datasets and architectures from computer vision such as ResNet and VGG.
arXiv Detail & Related papers (2022-02-16T19:11:42Z) - Extended Unconstrained Features Model for Exploring Deep Neural Collapse [59.59039125375527]
Recently, a phenomenon termed "neural collapse" (NC) has been empirically observed in deep neural networks.
Recent papers have shown that minimizers with this structure emerge when optimizing a simplified "unconstrained features model"
In this paper, we study the UFM for the regularized MSE loss, and show that the minimizers' features can be more structured than in the cross-entropy case.
arXiv Detail & Related papers (2022-02-16T14:17:37Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Neural networks with late-phase weights [66.72777753269658]
We show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning.
At the end of learning, we obtain back a single model by taking a spatial average in weight space.
arXiv Detail & Related papers (2020-07-25T13:23:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.