Related papers: NeuroGen: Neural Network Parameter Generation via Large Language Models

NeuroGen: Neural Network Parameter Generation via Large Language Models

URL: http://arxiv.org/abs/2505.12470v2
Date: Fri, 23 May 2025 06:25:28 GMT
Title: NeuroGen: Neural Network Parameter Generation via Large Language Models
Authors: Jiaqi Wang, Yusen Zhang, Xi Li,
Abstract summary: Acquiring the parameters of neural networks (NNs) has been one of the most important problems in machine learning.<n>This paper aims to explore the feasibility of a new direction: acquiring NN parameters via large language model generation.
Score: 32.16082052558773
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Acquiring the parameters of neural networks (NNs) has been one of the most important problems in machine learning since the inception of NNs. Traditional approaches, such as backpropagation and forward-only optimization, acquire parameters via iterative data fitting to gradually optimize them. This paper aims to explore the feasibility of a new direction: acquiring NN parameters via large language model generation. We propose NeuroGen, a generalized and easy-to-implement two-stage approach for NN parameter generation conditioned on descriptions of the data, task, and network architecture. Stage one is Parameter Reference Knowledge Injection, where LLMs are pretrained on NN checkpoints to build foundational understanding of parameter space, whereas stage two is Context-Enhanced Instruction Tuning, enabling LLMs to adapt to specific tasks through enriched, task-aware prompts. Experimental results demonstrate that NeuroGen effectively generates usable NN parameters. Our findings highlight the feasibility of LLM-based NN parameter generation and suggest a promising new paradigm where LLMs and lightweight NNs can coexist synergistically

Related papers

Neural Parameter Regression for Explicit Representations of PDE Solution Operators [22.355460388065964]
We introduce Neural Regression (NPR), a novel framework specifically developed for learning solution operators in Partial Differential Equations (PDEs) NPR employs Physics-Informed Neural Network (PINN, Raissi et al., 2021) techniques to regress Neural Network (NN) parameters. The framework shows remarkable adaptability to new initial and boundary conditions, allowing for rapid fine-tuning and inference.
arXiv Detail & Related papers (2024-03-19T14:30:56Z)
Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks [55.86300309474023]
We conduct a comprehensive stability and generalization analysis of gradient descent (GD) for multi-layer NNs. We derive the excess risk rate of $O(1/sqrtn)$ for GD algorithms in both two-layer and three-layer NNs.
arXiv Detail & Related papers (2023-05-26T12:51:38Z)
Learning to Control Rapidly Changing Synaptic Connections: An Alternative Type of Memory in Sequence Processing Artificial Neural Networks [9.605853974038936]
Generalising feedforward NNs to such RNNs is mathematically straightforward and natural, and even historical. A lesser known alternative approach to storing short-term memory in "synaptic connections" yields another "natural" type of short-term memory in sequence processing NNs. Fast Weight Programmers (FWPs) have seen a recent revival as generic sequence processors, achieving competitive performance across various tasks.
arXiv Detail & Related papers (2022-11-17T10:03:54Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Learning Regularization Parameters of Inverse Problems via Deep Neural Networks [0.0]
We consider a supervised learning approach, where a network is trained to approximate the mapping from observation data to regularization parameters. We show that a wide variety of regularization functionals, forward models, and noise models may be considered. The network-obtained regularization parameters can be computed more efficiently and may even lead to more accurate solutions.
arXiv Detail & Related papers (2021-04-14T02:38:38Z)
On the Sparsity of Neural Machine Translation Models [65.49762428553345]
We investigate whether redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different datasets and NMT architectures.
arXiv Detail & Related papers (2020-10-06T11:47:20Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.