Related papers: A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model

A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model

URL: http://arxiv.org/abs/2304.08109v2
Date: Tue, 18 Apr 2023 03:08:18 GMT
Title: A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model
Authors: Xianghui Sun, Yunjie Ji, Baochang Ma, Xiangang Li
Abstract summary: The selection of the foundational model, training dataset scale, learnable parameter quantity, and model training cost are all important factors. To facilitate the reproduction of the paper's results, the dataset, model and code will be released.
Score: 8.21938165599387
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, the instruction-tuning of large language models is a crucial area of research in the field of natural language processing. Due to resource and cost limitations, several researchers have employed parameter-efficient tuning techniques, such as LoRA, for instruction tuning, and have obtained encouraging results In comparison to full-parameter fine-tuning, LoRA-based tuning demonstrates salient benefits in terms of training costs. In this study, we undertook experimental comparisons between full-parameter fine-tuning and LoRA-based tuning methods, utilizing LLaMA as the base model. The experimental results show that the selection of the foundational model, training dataset scale, learnable parameter quantity, and model training cost are all important factors. We hope that the experimental conclusions of this paper can provide inspiration for training large language models, especially in the field of Chinese, and help researchers find a better trade-off strategy between training cost and model performance. To facilitate the reproduction of the paper's results, the dataset, model and code will be released.

Related papers

Cost-Optimal Grouped-Query Attention for Long-Context LLMs [64.90662568387683]
Building effective Transformer-based large language models (LLMs) has recently become a research focus. We compare models with different parameter sizes, context lengths, and attention head configurations in terms of model performance, computational cost, and memory cost. Our studies show that, when processing sufficiently long sequences, a larger model with fewer attention heads can achieve a lower loss while incurring lower computational and memory costs.
arXiv Detail & Related papers (2025-03-12T17:50:42Z)
Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling [87.17041933863041]
We introduce a Response-conditioned Bradley-Terry (Rc-BT) model that enhances the reward model's capability in length bias mitigating and length instruction following. We also propose the Rc-DPO algorithm to leverage the Rc-BT model for direct policy optimization (DPO) of large language models.
arXiv Detail & Related papers (2025-02-02T14:50:25Z)
Optimizing Large Language Models with an Enhanced LoRA Fine-Tuning Algorithm for Efficiency and Robustness in NLP Tasks [1.236974227340167]
This study proposes a large language model optimization method based on the improved LoRA fine-tuning algorithm. We fine-tune the large language model through a low-rank adaptation strategy, which significantly reduces the consumption of computing resources. The improved LoRA algorithm shows significant improvements in accuracy, F1 score, and MCC compared with traditional models.
arXiv Detail & Related papers (2024-12-25T01:10:25Z)
Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Learn-Focus-Review (LFR) is a dynamic training approach that adapts to the model's learning progress. LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset. Compared to baseline models trained on the full datasets, LFR consistently achieved lower perplexity and higher accuracy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z)
Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper [21.656923341138103]
Our study investigates strategies to enhance the model on new languages in the absence of original training data. Our experiments with a Chinese Whisper model (for Uyghur and Tibetan) yield better results with a more compact parameter set.
arXiv Detail & Related papers (2024-08-20T09:31:59Z)
The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA [0.0]
This study introduces a systematic framework to compare the efficacy of Large Language Models (LLMs) for fine-tuning across various cheminformatics tasks. We assessed three well-known models-RoBERTa, BART, and LLaMA-on their ability to predict molecular properties. We found that LLaMA-based models generally offered the lowest validation loss, suggesting their superior adaptability across tasks and scales.
arXiv Detail & Related papers (2024-05-02T02:20:12Z)
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies [85.57899012821211]
Small Language Models (SLMs) are a resource-efficient alternative to Large Language Models (LLMs) We introduce MiniCPM, specifically the 1.2B and 2.4B non-embedding parameter variants. We also introduce MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE and MiniCPM-128K.
arXiv Detail & Related papers (2024-04-09T15:36:50Z)
Empirical Studies of Parameter Efficient Methods for Large Language Models of Code and Knowledge Transfer to R [1.9799527196428242]
We evaluate PEFT methods, LoRA, Compacter, and IA3 on Large Language Models for code summarization and generation. Our experiments reveal that LoRA consistently outperforms Compacter and IA3 in all settings. Our study can direct future research in developing code intelligent tasks for unseen languages including R.
arXiv Detail & Related papers (2024-03-16T03:12:45Z)
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z)
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z)
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models [116.50367506746713]
We present an empirical study of scaling LLaVA up to 33B and 65B/70B. We find that scaling LMM consistently enhances model performance and improves language capabilities. We hope that this study makes state-of-the-art LMM research at a larger scale more accessible.
arXiv Detail & Related papers (2023-09-18T17:30:46Z)
MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models [90.99663022952498]
SuperB was proposed to evaluate the generalizability of self-supervised learning (SSL) speech models across various tasks. SuperB incurs high computational costs due to the large datasets and diverse tasks. We introduce MiniSUPERB, a lightweight benchmark that efficiently evaluates SSL speech models with comparable results to SUPERB but lower computational costs significantly.
arXiv Detail & Related papers (2023-05-30T13:07:33Z)
On the Economics of Multilingual Few-shot Learning: Modeling the Cost-Performance Trade-offs of Machine Translated and Manual Data [12.638781962950805]
We introduce a framework to evaluate the performance and cost trade-offs between machine-translated and manually-created labelled data. We illustrate the effectiveness of our framework through a case-study on the TyDIQA-GoldP dataset.
arXiv Detail & Related papers (2022-05-12T20:27:01Z)
Feeding What You Need by Understanding What You Learned [54.400455868448695]
Machine Reading (MRC) reveals the ability to understand a given text passage and answer questions based on it. Existing research works in MRC rely heavily on large-size models and corpus to improve the performance evaluated by metrics such as Exact Match. We argue that a deep understanding of model capabilities and data properties can help us feed a model with appropriate training data.
arXiv Detail & Related papers (2022-03-05T14:15:59Z)
Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning [30.5853328612593]
In this work, we explore fine-tuning methods of BERT -- a pre-trained Transformer based language model. Our experimental results show an advantage in model performance by maximizing the approximate knowledge gain of the model. We analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters.
arXiv Detail & Related papers (2020-12-04T08:34:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.