ParameterNet: Parameters Are All You Need
- URL: http://arxiv.org/abs/2306.14525v2
- Date: Sun, 14 Jan 2024 12:20:01 GMT
- Title: ParameterNet: Parameters Are All You Need
- Authors: Kai Han, Yunhe Wang, Jianyuan Guo, Enhua Wu
- Abstract summary: We introduce a novel design principle, termed Net, aimed at augmenting the number of parameters in large-scale visual pretraining models.
We leverage dynamic convolutions to incorporate additional parameters into the networks with only a marginal rise in FLOPs.
The Net approach allows low-FLOPs networks to take advantage of large-scale visual pretraining.
- Score: 50.150436250355945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The large-scale visual pretraining has significantly improve the performance
of large vision models. However, we observe the \emph{low FLOPs pitfall} that
the existing low-FLOPs models cannot benefit from large-scale pretraining. In
this paper, we introduce a novel design principle, termed ParameterNet, aimed
at augmenting the number of parameters in large-scale visual pretraining models
while minimizing the increase in FLOPs. We leverage dynamic convolutions to
incorporate additional parameters into the networks with only a marginal rise
in FLOPs. The ParameterNet approach allows low-FLOPs networks to take advantage
of large-scale visual pretraining. Furthermore, we extend the ParameterNet
concept to the language domain to enhance inference results while preserving
inference speed. Experiments on the large-scale ImageNet-22K have shown the
superiority of our ParameterNet scheme. For example, ParameterNet-600M can
achieve higher accuracy on ImageNet than the widely-used Swin Transformer
(81.6\% \emph{vs.} 80.9\%) and has much lower FLOPs (0.6G \emph{vs.} 4.5G). In
the language domain, LLaMA-1B enhanced with ParameterNet achieves 2\% higher
accuracy over vanilla LLaMA. The code will be released at
\url{https://parameternet.github.io/}.
Related papers
- LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models [3.7049613588433497]
Low-Rank Adaptation (LoRA) significantly reduces the number of trainable parameters for fine-tuning.
We extend the LoRA to multiple scales, dubbed as LoRA$2$.
arXiv Detail & Related papers (2024-08-13T12:31:30Z) - Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis [16.253898272659242]
This study focuses on Transformer-based LLMs, specifically applying low-rank parametrization to feedforward networks (FFNs)
Experiments on the large RefinedWeb dataset show that low-rank parametrization is both efficient (e.g., 2.6$times$ FFN speed-up with 32% parameters) and effective during training.
Motivated by this finding, we develop the wide and structured networks surpassing the current medium-sized and large-sized Transformer in perplexity and throughput performance.
arXiv Detail & Related papers (2024-07-13T10:08:55Z) - LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters [31.55846326336193]
Graph HyperNetworks (GHNs) have recently shown strong performance in initializing large vision models.
LoGAH allows us to predict the parameters of 774-million large neural networks in a memory-efficient manner.
arXiv Detail & Related papers (2024-05-25T15:56:15Z) - From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers [52.199303258423306]
We propose a novel density loss that encourages higher activation sparsity in pre-trained models.
Our proposed method, textbfDEFT, can consistently reduce activation density by up to textbf44.94% on RoBERTa$_mathrmLarge$ and by textbf53.19% (encoder density) and textbf90.60% (decoder density) on Flan-T5$_mathrmXXL$.
arXiv Detail & Related papers (2024-02-02T21:25:46Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - Full Parameter Fine-tuning for Large Language Models with Limited Resources [55.794732214059806]
Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training.
We propose a new computation, LOw-Memory Optimization (LOMO), which fuses the gradient and the parameter update in one step to reduce memory usage.
arXiv Detail & Related papers (2023-06-16T11:37:15Z) - Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning [91.5113227694443]
We propose a novel visual.
sensuous-aware fine-Tuning (SPT) scheme.
SPT allocates trainable parameters to task-specific important positions.
Experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods.
arXiv Detail & Related papers (2023-03-15T12:34:24Z) - Pretraining a Neural Network before Knowing Its Architecture [2.170169149901781]
Training large neural networks is possible by training a smaller hypernetwork that predicts parameters for the large ones.
A recently released Graph HyperNetwork (GHN) trained this way on one million smaller ImageNet architectures is able to predict parameters for large unseen networks such as ResNet-50.
While networks with predicted parameters lose performance on the source task, the predicted parameters have been found useful for fine-tuning on other tasks.
arXiv Detail & Related papers (2022-07-20T17:27:50Z) - Go Wider Instead of Deeper [11.4541055228727]
We propose a framework to deploy trainable parameters efficiently, by going wider instead of deeper.
Our best model outperforms Vision Transformer (ViT) by $1.46%$ with $0.72 times$ trainable parameters.
Our framework can still surpass ViT and ViT-MoE by $0.83%$ and $2.08%$, respectively.
arXiv Detail & Related papers (2021-07-25T14:44:24Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.