Related papers: Generative Modeling of Weights: Generalization or Memorization?

Generative Modeling of Weights: Generalization or Memorization?

URL: http://arxiv.org/abs/2506.07998v2
Date: Fri, 03 Oct 2025 17:58:30 GMT
Title: Generative Modeling of Weights: Generalization or Memorization?
Authors: Boya Zeng, Yida Yin, Zhiqiu Xu, Zhuang Liu,
Abstract summary: Generative models take neural network checkpoints as training data and aim to generate high-performing weights during inference.<n>In this work, we examine four representative, well-known methods on their ability to generate novel model weights.<n>We find that these methods synthesize weights largely by memorization: they produce replicas, or, at best, simples of the training checkpoints.
Score: 10.591494826215142
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative models have recently been explored for synthesizing neural network weights. These approaches take neural network checkpoints as training data and aim to generate high-performing weights during inference. In this work, we examine four representative, well-known methods on their ability to generate novel model weights, i.e., weights that are different from the checkpoints seen during training. Contrary to claims in prior work, we find that these methods synthesize weights largely by memorization: they produce either replicas, or, at best, simple interpolations of the training checkpoints. Moreover, they fail to outperform simple baselines, such as adding noise to the weights or taking a simple weight ensemble, in obtaining different and simultaneously high-performing models. Our further analysis suggests that this memorization might result from limited data, overparameterized models, and the underuse of structural priors specific to weight data. These findings highlight the need for more careful design and rigorous evaluation of generative models when applied to new domains. Our code is available at https://github.com/boyazeng/weight_memorization.

Related papers

The Impact of Model Zoo Size and Composition on Weight Space Learning [8.11780615053558]
Re-using trained neural network models is a common strategy to reduce training cost and transfer knowledge.<n>Weight space learning is a promising new field to re-use populations of pre-trained models for future tasks.<n>We propose a modification to a common weight space learning method to accommodate training on heterogeneous populations of models.
arXiv Detail & Related papers (2025-04-14T11:54:06Z)
Generative Feature Training of Thin 2-Layer Networks [0.0]
We consider the approximation of functions by 2-layer neural networks with a small number of hidden weights based on squared loss and small datasets.<n>As a highly hidden model, we exploit hidden weights with samples from learned distribution proposal.<n>We refine the sampled weights with gradient-based post-processing in the latent space.
arXiv Detail & Related papers (2024-11-11T10:32:33Z)
Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network) After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
Neural Metamorphosis [72.88137795439407]
This paper introduces a new learning paradigm termed Neural Metamorphosis (NeuMeta), which aims to build self-morphable neural networks. NeuMeta directly learns the continuous weight manifold of neural networks. It sustains full-size performance even at a 75% compression rate.
arXiv Detail & Related papers (2024-10-10T14:49:58Z)
Efficient Training with Denoised Neural Weights [65.14892033932895]
This work takes a novel step towards building a weight generator to synthesize the neural weights for initialization. We use the image-to-image translation task with generative adversarial networks (GANs) as an example due to the ease of collecting model weights. By initializing the image translation model with the denoised weights predicted by our diffusion model, the training requires only 43.3 seconds.
arXiv Detail & Related papers (2024-07-16T17:59:42Z)
Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation. Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z)
Stochastic Approximation Approach to Federated Machine Learning [0.0]
This paper examines Federated learning (FL) in a Approximation (SA) framework. FL is a collaborative way to train neural network models across various participants or clients. It is observed that the proposed algorithm is robust and gives more reliable estimates of the weights.
arXiv Detail & Related papers (2024-02-20T12:00:25Z)
Initializing Models with Larger Ones [76.41561758293055]
We introduce weight selection, a method for initializing smaller models by selecting a subset of weights from a pretrained larger model. Our experiments demonstrate that weight selection can significantly enhance the performance of small models and reduce their training time.
arXiv Detail & Related papers (2023-11-30T18:58:26Z)
Data Augmentations in Deep Weight Spaces [89.45272760013928]
We introduce a novel augmentation scheme based on the Mixup method. We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate.
arXiv Detail & Related papers (2023-11-15T10:43:13Z)
Diffusion-Model-Assisted Supervised Learning of Generative Models for Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation. We use the score-based diffusion model to generate labeled data. Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z)
Reusing Pretrained Models by Multi-linear Operators for Efficient Training [65.64075958382034]
Training large models from scratch usually costs a substantial amount of resources. Recent studies such as bert2BERT and LiGO have reused small pretrained models to initialize a large model. We propose a method that linearly correlates each weight of the target model to all the weights of the pretrained model.
arXiv Detail & Related papers (2023-10-16T06:16:47Z)
Learning to Jump: Thinning and Thickening Latent Counts for Generative Modeling [69.60713300418467]
Learning to jump is a general recipe for generative modeling of various types of data. We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better.
arXiv Detail & Related papers (2023-05-28T05:38:28Z)
Revealing Secrets From Pre-trained Models [2.0249686991196123]
Transfer-learning has been widely adopted in many emerging deep learning algorithms. We show that pre-trained models and fine-tuned models have significantly high similarities in weight values. We propose a new model extraction attack that reveals the model architecture and the pre-trained model used by the black-box victim model.
arXiv Detail & Related papers (2022-07-19T20:19:03Z)
Amortized learning of neural causal representations [10.140457813764554]
Causal models can compactly and efficiently encode the data-generating process under all interventions. These models are often represented as Bayesian networks and learning them scales poorly with the number of variables. We represent a novel algorithm called textitcausal relational networks (CRN) for learning causal models using neural networks.
arXiv Detail & Related papers (2020-08-21T04:35:06Z)
Neural networks with late-phase weights [66.72777753269658]
We show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning. At the end of learning, we obtain back a single model by taking a spatial average in weight space.
arXiv Detail & Related papers (2020-07-25T13:23:37Z)
Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model. Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z)
Efficient Learning of Model Weights via Changing Features During Training [0.0]
We propose a machine learning model, which dynamically changes the features during training. Our main motivation is to update the model in a small content during the training process with replacing less descriptive features to new ones from a large pool.
arXiv Detail & Related papers (2020-02-21T12:38:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.