Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared
Pre-trained Language Models
- URL: http://arxiv.org/abs/2310.12818v1
- Date: Thu, 19 Oct 2023 15:13:58 GMT
- Title: Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared
Pre-trained Language Models
- Authors: Weize Chen, Xiaoyue Xu, Xu Han, Yankai Lin, Ruobing Xie, Zhiyuan Liu,
Maosong Sun, Jie Zhou
- Abstract summary: We introduce a technique to enhance the inference efficiency of parameter-shared language models.
We also propose a simple pre-training technique that leads to fully or partially shared models.
Results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs.
- Score: 109.06052781040916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter-shared pre-trained language models (PLMs) have emerged as a
successful approach in resource-constrained environments, enabling substantial
reductions in model storage and memory costs without significant performance
compromise. However, it is important to note that parameter sharing does not
alleviate computational burdens associated with inference, thus impeding its
practicality in situations characterized by limited stringent latency
requirements or computational resources. Building upon neural ordinary
differential equations (ODEs), we introduce a straightforward technique to
enhance the inference efficiency of parameter-shared PLMs. Additionally, we
propose a simple pre-training technique that leads to fully or partially shared
models capable of achieving even greater inference acceleration. The
experimental results demonstrate the effectiveness of our methods on both
autoregressive and autoencoding PLMs, providing novel insights into more
efficient utilization of parameter-shared models in resource-constrained
settings.
Related papers
- Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement [0.7558576228782637]
We propose a framework for efficient Source-Free Domain Adaptation (SFDA)
Our approach introduces an improved paradigm for source-model preparation and target-side adaptation.
We demonstrate that our framework is compatible with various SFDA methods and achieves significant computational efficiency.
arXiv Detail & Related papers (2024-10-03T02:12:03Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Parameter-Efficient Fine-Tuning With Adapters [5.948206235442328]
This research introduces a novel adaptation method utilizing the UniPELT framework as a base.
Our method employs adapters, which enable efficient transfer of pretrained models to new tasks with minimal retraining of the base model parameters.
arXiv Detail & Related papers (2024-05-09T01:40:38Z) - LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient Fine-Tuning of Large Language Models [7.926974917872204]
LoRA-SP is a novel approach utilizing randomized half-selective parameter freezing.
LoRA-SP significantly reduces computational and memory requirements without compromising model performance.
arXiv Detail & Related papers (2024-02-28T06:50:10Z) - Simulated Overparameterization [35.12611686956487]
We introduce a novel paradigm called Simulated Overparametrization ( SOP)
SOP proposes a unique approach to model training and inference, where a model with a significantly larger number of parameters is trained in such a way as a smaller, efficient subset of these parameters is used for the actual computation during inference.
We present a novel, architecture agnostic algorithm called "majority kernels", which seamlessly integrates with predominant architectures, including Transformer models.
arXiv Detail & Related papers (2024-02-07T17:07:41Z) - Do deep neural networks utilize the weight space efficiently? [2.9914612342004503]
Deep learning models like Transformers and Convolutional Neural Networks (CNNs) have revolutionized various domains, but their parameter-intensive nature hampers deployment in resource-constrained settings.
We introduce a novel concept utilizing column space and row space of weight matrices, which allows for a substantial reduction in model parameters without compromising performance.
Our approach applies to both Bottleneck and Attention layers, effectively halving the parameters while incurring only minor performance degradation.
arXiv Detail & Related papers (2024-01-26T21:51:49Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Federated Learning of Large Language Models with Parameter-Efficient
Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data.
The training process of Large Language Models (LLMs) generally incurs the update of significant parameters.
This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.