A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on
Chinese Instruction Data for Instruction Following Large Language Model
- URL: http://arxiv.org/abs/2304.08109v2
- Date: Tue, 18 Apr 2023 03:08:18 GMT
- Title: A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on
Chinese Instruction Data for Instruction Following Large Language Model
- Authors: Xianghui Sun, Yunjie Ji, Baochang Ma, Xiangang Li
- Abstract summary: The selection of the foundational model, training dataset scale, learnable parameter quantity, and model training cost are all important factors.
To facilitate the reproduction of the paper's results, the dataset, model and code will be released.
- Score: 8.21938165599387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the instruction-tuning of large language models is a crucial area
of research in the field of natural language processing. Due to resource and
cost limitations, several researchers have employed parameter-efficient tuning
techniques, such as LoRA, for instruction tuning, and have obtained encouraging
results In comparison to full-parameter fine-tuning, LoRA-based tuning
demonstrates salient benefits in terms of training costs. In this study, we
undertook experimental comparisons between full-parameter fine-tuning and
LoRA-based tuning methods, utilizing LLaMA as the base model. The experimental
results show that the selection of the foundational model, training dataset
scale, learnable parameter quantity, and model training cost are all important
factors. We hope that the experimental conclusions of this paper can provide
inspiration for training large language models, especially in the field of
Chinese, and help researchers find a better trade-off strategy between training
cost and model performance. To facilitate the reproduction of the paper's
results, the dataset, model and code will be released.
Related papers
- Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper [21.656923341138103]
Our study investigates strategies to enhance the model on new languages in the absence of original training data.
Our experiments with a Chinese Whisper model (for Uyghur and Tibetan) yield better results with a more compact parameter set.
arXiv Detail & Related papers (2024-08-20T09:31:59Z) - The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA [0.0]
This study introduces a systematic framework to compare the efficacy of Large Language Models (LLMs) for fine-tuning across various cheminformatics tasks.
We assessed three well-known models-RoBERTa, BART, and LLaMA-on their ability to predict molecular properties.
We found that LLaMA-based models generally offered the lowest validation loss, suggesting their superior adaptability across tasks and scales.
arXiv Detail & Related papers (2024-05-02T02:20:12Z) - MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies [85.57899012821211]
Small Language Models (SLMs) are a resource-efficient alternative to Large Language Models (LLMs)
We introduce MiniCPM, specifically the 1.2B and 2.4B non-embedding parameter variants.
We also introduce MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE and MiniCPM-128K.
arXiv Detail & Related papers (2024-04-09T15:36:50Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models [116.50367506746713]
We present an empirical study of scaling LLaVA up to 33B and 65B/70B.
We find that scaling LMM consistently enhances model performance and improves language capabilities.
We hope that this study makes state-of-the-art LMM research at a larger scale more accessible.
arXiv Detail & Related papers (2023-09-18T17:30:46Z) - MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models [90.99663022952498]
SuperB was proposed to evaluate the generalizability of self-supervised learning (SSL) speech models across various tasks.
SuperB incurs high computational costs due to the large datasets and diverse tasks.
We introduce MiniSUPERB, a lightweight benchmark that efficiently evaluates SSL speech models with comparable results to SUPERB but lower computational costs significantly.
arXiv Detail & Related papers (2023-05-30T13:07:33Z) - On the Economics of Multilingual Few-shot Learning: Modeling the
Cost-Performance Trade-offs of Machine Translated and Manual Data [12.638781962950805]
We introduce a framework to evaluate the performance and cost trade-offs between machine-translated and manually-created labelled data.
We illustrate the effectiveness of our framework through a case-study on the TyDIQA-GoldP dataset.
arXiv Detail & Related papers (2022-05-12T20:27:01Z) - Feeding What You Need by Understanding What You Learned [54.400455868448695]
Machine Reading (MRC) reveals the ability to understand a given text passage and answer questions based on it.
Existing research works in MRC rely heavily on large-size models and corpus to improve the performance evaluated by metrics such as Exact Match.
We argue that a deep understanding of model capabilities and data properties can help us feed a model with appropriate training data.
arXiv Detail & Related papers (2022-03-05T14:15:59Z) - Fine-tuning BERT for Low-Resource Natural Language Understanding via
Active Learning [30.5853328612593]
In this work, we explore fine-tuning methods of BERT -- a pre-trained Transformer based language model.
Our experimental results show an advantage in model performance by maximizing the approximate knowledge gain of the model.
We analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters.
arXiv Detail & Related papers (2020-12-04T08:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.