MindLLM: Pre-training Lightweight Large Language Model from Scratch,
Evaluations and Domain Applications
- URL: http://arxiv.org/abs/2310.15777v2
- Date: Sun, 29 Oct 2023 01:17:53 GMT
- Title: MindLLM: Pre-training Lightweight Large Language Model from Scratch,
Evaluations and Domain Applications
- Authors: Yizhe Yang, Huashan Sun, Jiawei Li, Runheng Liu, Yinghao Li, Yuhang
Liu, Heyan Huang, Yang Gao
- Abstract summary: We present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch.
A thorough account of experiences accrued during large model development is given, covering every step of the process.
MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks.
- Score: 46.337078949637345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across
various natural language tasks, marking significant strides towards general
artificial intelligence. While general artificial intelligence is leveraged by
developing increasingly large-scale models, there could be another branch to
develop lightweight custom models that better serve certain domains, taking
into account the high cost of training and deploying LLMs and the scarcity of
resources. In this paper, we present MindLLM, a novel series of bilingual
lightweight large language models, trained from scratch, alleviating such
burdens by offering models with 1.3 billion and 3 billion parameters. A
thorough account of experiences accrued during large model development is
given, covering every step of the process, including data construction, model
architecture, evaluation, and applications. Such insights are hopefully
valuable for fellow academics and developers. MindLLM consistently matches or
surpasses the performance of other open-source larger models on some public
benchmarks. We also introduce an innovative instruction tuning framework
tailored for smaller models to enhance their capabilities efficiently.
Moreover, we explore the application of MindLLM in specific vertical domains
such as law and finance, underscoring the agility and adaptability of our
lightweight models.
Related papers
- EmbedLLM: Learning Compact Representations of Large Language Models [28.49433308281983]
We propose EmbedLLM, a framework designed to learn compact vector representations of Large Language Models.
We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness.
Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency.
arXiv Detail & Related papers (2024-10-03T05:43:24Z) - Apple Intelligence Foundation Language Models [109.60033785567484]
This report describes the model architecture, the data used to train the model, the training process, and the evaluation results.
We highlight our focus on Responsible AI and how the principles are applied throughout the model development.
arXiv Detail & Related papers (2024-07-29T18:38:49Z) - LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z) - Gl\'orIA - A Generative and Open Large Language Model for Portuguese [4.782288068552145]
We introduce Gl'orIA, a robust European Portuguese decoder LLM.
To pre-train Gl'orIA, we assembled a comprehensive PT-PT text corpus comprising 35 billion tokens from various sources.
Evaluation shows that Gl'orIA significantly outperforms existing open PT decoder models in language modeling.
arXiv Detail & Related papers (2024-02-20T12:36:40Z) - A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades.
Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora.
To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z) - Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost.
Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z) - Legal-Tech Open Diaries: Lesson learned on how to develop and deploy
light-weight models in the era of humongous Language Models [10.086015702323971]
We follow the steps of the R&D group of a modern legal-tech start-up and present important insights on model development and deployment.
We start from ground zero by pre-training multiple domain-specific multi-lingual LMs which are a better fit to contractual and regulatory text.
We present benchmark results of such models in a half-public half-private legal benchmark comprising 5 downstream tasks showing the impact of larger model size.
arXiv Detail & Related papers (2022-10-24T10:08:59Z) - Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models.
A collection of pretrained encoders perceive diverse modalities (such as vision, and language)
We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z) - Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese [33.83704598544326]
Mengzi stands for a family of discriminative, generative, domain-specific, and multimodal pre-trained model variants.
Compared with public Chinese PLMs, Mengzi is simple but more powerful.
Our lightweight model has achieved new state-of-the-art results on the widely-used CLUE benchmark.
arXiv Detail & Related papers (2021-10-13T13:14:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.