CoLLiE: Collaborative Training of Large Language Models in an Efficient
Way
- URL: http://arxiv.org/abs/2312.00407v1
- Date: Fri, 1 Dec 2023 08:02:16 GMT
- Title: CoLLiE: Collaborative Training of Large Language Models in an Efficient
Way
- Authors: Kai Lv, Shuo Zhang, Tianle Gu, Shuhao Xing, Jiawei Hong, Keyu Chen,
Xiaoran Liu, Yuqing Yang, Honglin Guo, Tengxiao Liu, Yu Sun, Qipeng Guo, Hang
Yan, Xipeng Qiu
- Abstract summary: CoLLiE is an efficient library that facilitates collaborative training of large language models.
With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization.
- Score: 59.09824823710863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are increasingly pivotal in a wide range of
natural language processing tasks. Access to pre-trained models, courtesy of
the open-source community, has made it possible to adapt these models to
specific applications for enhanced performance. However, the substantial
resources required for training these models necessitate efficient solutions.
This paper introduces CoLLiE, an efficient library that facilitates
collaborative training of large language models using 3D parallelism,
parameter-efficient fine-tuning (PEFT) methods, and optimizers such as Lion,
Adan, Sophia, LOMO and AdaLomo. With its modular design and comprehensive
functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and
customization. CoLLiE has proven superior training efficiency in comparison
with prevalent solutions in pre-training and fine-tuning scenarios.
Furthermore, we provide an empirical evaluation of the correlation between
model size and GPU memory consumption under different optimization methods, as
well as an analysis of the throughput. Lastly, we carry out a comprehensive
comparison of various optimizers and PEFT methods within the instruction-tuning
context. CoLLiE is available at https://github.com/OpenLMLab/collie.
Related papers
- EmbedLLM: Learning Compact Representations of Large Language Models [28.49433308281983]
We propose EmbedLLM, a framework designed to learn compact vector representations of Large Language Models.
We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness.
Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency.
arXiv Detail & Related papers (2024-10-03T05:43:24Z) - Bucket Pre-training is All You Need [9.332544709626875]
Large language models (LLMs) have demonstrated exceptional performance across various natural language processing tasks.
The conventional fixed-length data composition strategy for pretraining, which involves concatenating and splitting documents, can introduce noise and limit the model's ability to capture long-range dependencies.
We propose a multi-bucket data composition method that moves beyond the fixed-length paradigm, offering a more flexible and efficient approach to pretraining.
arXiv Detail & Related papers (2024-07-10T09:27:23Z) - MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance Optimizations [6.919817502555546]
In this paper, we propose MIREncoder, a Multi-modal IR-based Auto-Encoder that can be pre-trained to generate a learned embedding space.
A multi-modal approach enables us to better extract features from compilable programs.
Our evaluations will show that our proposed approach can outperform the state of the art while reducing overhead.
arXiv Detail & Related papers (2024-07-02T13:00:19Z) - Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - Efficient Parallelization Layouts for Large-Scale Distributed Model Training [17.16249954009967]
We conduct a comprehensive study of possible training configurations for large language models.
We find that using a micro-batch size of 1 usually enables the most efficient training layouts.
Our most efficient configurations enable us to achieve state-of-the-art training efficiency results over a range of model sizes.
arXiv Detail & Related papers (2023-11-09T18:59:38Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Slapo: A Schedule Language for Progressive Optimization of Large Deep
Learning Model Training [17.556432199389615]
Slapo is a schedule language that decouples the execution of a tensor-level operator from its arithmetic definition.
We show that Slapo can improve training throughput by up to 2.92x on a single machine with 8 NVIDIA V100 GPUs.
arXiv Detail & Related papers (2023-02-16T00:34:53Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Scalable and Efficient MoE Training for Multitask Multilingual Models [55.987536562357086]
We develop a system capable of scaling MoE models efficiently to trillions of parameters.
We also present new training methods to improve MoE sample efficiency and leverage expert pruning strategy to improve time efficiency.
A model trained with 10 billion parameters on 50 languages can achieve state-of-the-art performance in Machine Translation (MT) and multilingual natural language generation tasks.
arXiv Detail & Related papers (2021-09-22T00:57:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.