Recurrent Diffusion for Large-Scale Parameter Generation
- URL: http://arxiv.org/abs/2501.11587v2
- Date: Tue, 11 Feb 2025 03:29:30 GMT
- Title: Recurrent Diffusion for Large-Scale Parameter Generation
- Authors: Kai Wang, Dongwen Tang, Wangbo Zhao, Konstantin Schürholt, Zhangyang Wang, Yang You,
- Abstract summary: We introduce Recurrent Diffusion for Large Scale Generation (RPG), a novel framework that generates full neural network parameters up to hundreds of millions on a single GPU.<n>RPG serves as a critical advance in AI generating AI, potentially enabling efficient weight generation at scales previously deemed infeasible.
- Score: 52.98888368644455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter generation has long struggled to match the scale of today large vision and language models, curbing its broader utility. In this paper, we introduce Recurrent Diffusion for Large Scale Parameter Generation (RPG), a novel framework that generates full neural network parameters up to hundreds of millions on a single GPU. Our approach first partitions a networks parameters into non-overlapping tokens, each corresponding to a distinct portion of the model. A recurrent mechanism then learns the inter token relationships, producing prototypes which serve as conditions for a diffusion process that ultimately synthesizes the full parameters. Across a spectrum of architectures and tasks including ResNets, ConvNeXts and ViTs on ImageNet 1K and COCO, and even LoRA based LLMs RPG achieves performance on par with fully trained networks while avoiding excessive memory overhead. Notably, it generalizes beyond its training set to generate valid parameters for previously unseen tasks, highlighting its flexibility in dynamic and open ended scenarios. By overcoming the longstanding memory and scalability barriers, RPG serves as a critical advance in AI generating AI, potentially enabling efficient weight generation at scales previously deemed infeasible.
Related papers
- Instruction-Guided Autoregressive Neural Network Parameter Generation [49.800239140036496]
We propose IGPG, an autoregressive framework that unifies parameter synthesis across diverse tasks and architectures.
By autoregressively generating neural network weights' tokens, IGPG ensures inter-layer coherence and enables efficient adaptation across models and datasets.
Experiments on multiple datasets demonstrate that IGPG consolidates diverse pretrained models into a single, flexible generative framework.
arXiv Detail & Related papers (2025-04-02T05:50:19Z) - LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive.
Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones.
We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Conditional LoRA Parameter Generation [18.34892473337235]
We propose COND P-DIFF, a novel approach that demonstrates the feasibility of controllable high-performance parameter generation.
Experimental results in both computer vision and natural language processing domains consistently demonstrate that COND P-DIFF can generate high-performance parameters conditioned on the given task.
Our work paves the way for further exploration of condition-driven parameter generation, offering a promising direction for task-specific adaptation of neural networks.
arXiv Detail & Related papers (2024-08-02T17:43:34Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters [31.55846326336193]
Graph HyperNetworks (GHNs) have recently shown strong performance in initializing large vision models.
LoGAH allows us to predict the parameters of 774-million large neural networks in a memory-efficient manner.
arXiv Detail & Related papers (2024-05-25T15:56:15Z) - Neural Network Diffusion [45.851945143942885]
A diffusion model is trained to synthesize latent representations from random noise.<n>This model then generates new representations, which are passed through the autoencoder's decoder to produce new subsets of high-performing network parameters.
arXiv Detail & Related papers (2024-02-20T16:59:03Z) - Tracing Hyperparameter Dependencies for Model Parsing via Learnable Graph Pooling Network [21.484648648511854]
We propose a novel model parsing method called Learnable Graph Pooling Network (LGPN)
LGPN incorporates a learnable pooling-unpooling mechanism tailored to model parsing.
We extend our proposed method to CNN-generated image detection and coordinate attacks detection.
arXiv Detail & Related papers (2023-12-03T22:05:05Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion [89.92242000948026]
We propose a transformer architecture based on a dedicated encoder/decoder framework.
Through a dynamic expansion of special tokens, we specialize each forward of our decoder network on a task distribution.
Our strategy scales to a large number of tasks while having negligible memory and time overheads.
arXiv Detail & Related papers (2021-11-22T16:29:06Z) - Recurrent Parameter Generators [42.159272098922685]
We present a generic method for recurrently using the same parameters for many different convolution layers to build a deep network.
We demonstrate how to build a one-layer neural network to achieve similar performance compared to other traditional CNN models.
arXiv Detail & Related papers (2021-07-15T04:23:59Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.