ParameterNet: Parameters Are All You Need
- URL: http://arxiv.org/abs/2306.14525v2
- Date: Sun, 14 Jan 2024 12:20:01 GMT
- Title: ParameterNet: Parameters Are All You Need
- Authors: Kai Han, Yunhe Wang, Jianyuan Guo, Enhua Wu
- Abstract summary: We introduce a novel design principle, termed Net, aimed at augmenting the number of parameters in large-scale visual pretraining models.
We leverage dynamic convolutions to incorporate additional parameters into the networks with only a marginal rise in FLOPs.
The Net approach allows low-FLOPs networks to take advantage of large-scale visual pretraining.
- Score: 50.150436250355945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The large-scale visual pretraining has significantly improve the performance
of large vision models. However, we observe the \emph{low FLOPs pitfall} that
the existing low-FLOPs models cannot benefit from large-scale pretraining. In
this paper, we introduce a novel design principle, termed ParameterNet, aimed
at augmenting the number of parameters in large-scale visual pretraining models
while minimizing the increase in FLOPs. We leverage dynamic convolutions to
incorporate additional parameters into the networks with only a marginal rise
in FLOPs. The ParameterNet approach allows low-FLOPs networks to take advantage
of large-scale visual pretraining. Furthermore, we extend the ParameterNet
concept to the language domain to enhance inference results while preserving
inference speed. Experiments on the large-scale ImageNet-22K have shown the
superiority of our ParameterNet scheme. For example, ParameterNet-600M can
achieve higher accuracy on ImageNet than the widely-used Swin Transformer
(81.6\% \emph{vs.} 80.9\%) and has much lower FLOPs (0.6G \emph{vs.} 4.5G). In
the language domain, LLaMA-1B enhanced with ParameterNet achieves 2\% higher
accuracy over vanilla LLaMA. The code will be released at
\url{https://parameternet.github.io/}.
Related papers
- Recurrent Diffusion for Large-Scale Parameter Generation [52.98888368644455]
We introduce Recurrent Diffusion for Large Scale Generation (RPG), a novel framework that generates full neural network parameters up to hundreds of millions on a single GPU.
RPG serves as a critical advance in AI generating AI, potentially enabling efficient weight generation at scales previously deemed infeasible.
arXiv Detail & Related papers (2025-01-20T16:46:26Z) - LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models [3.7049613588433497]
Low-Rank Adaptation (LoRA) significantly reduces the number of trainable parameters for fine-tuning.
We extend the LoRA to multiple scales, dubbed as LoRA$2$.
arXiv Detail & Related papers (2024-08-13T12:31:30Z) - Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis [16.253898272659242]
This study focuses on Transformer-based LLMs, specifically applying low-rank parametrization to feedforward networks (FFNs)
Experiments on the large RefinedWeb dataset show that low-rank parametrization is both efficient (e.g., 2.6$times$ FFN speed-up with 32% parameters) and effective during training.
Motivated by this finding, we develop the wide and structured networks surpassing the current medium-sized and large-sized Transformer in perplexity and throughput performance.
arXiv Detail & Related papers (2024-07-13T10:08:55Z) - SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead [75.87007729801304]
SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead.
To optimize the pruning process itself, only thresholds are communicated between a server and clients instead of parameters.
Global thresholds are used to update model parameters by extracting aggregated parameter importance.
arXiv Detail & Related papers (2024-06-01T13:10:35Z) - LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters [31.55846326336193]
Graph HyperNetworks (GHNs) have recently shown strong performance in initializing large vision models.
LoGAH allows us to predict the parameters of 774-million large neural networks in a memory-efficient manner.
arXiv Detail & Related papers (2024-05-25T15:56:15Z) - From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers [52.199303258423306]
We propose a novel density loss that encourages higher activation sparsity in pre-trained models.
Our proposed method, textbfDEFT, can consistently reduce activation density by up to textbf44.94% on RoBERTa$_mathrmLarge$ and by textbf53.19% (encoder density) and textbf90.60% (decoder density) on Flan-T5$_mathrmXXL$.
arXiv Detail & Related papers (2024-02-02T21:25:46Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - Go Wider Instead of Deeper [11.4541055228727]
We propose a framework to deploy trainable parameters efficiently, by going wider instead of deeper.
Our best model outperforms Vision Transformer (ViT) by $1.46%$ with $0.72 times$ trainable parameters.
Our framework can still surpass ViT and ViT-MoE by $0.83%$ and $2.08%$, respectively.
arXiv Detail & Related papers (2021-07-25T14:44:24Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.