MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
- URL: http://arxiv.org/abs/2402.16840v1
- Date: Mon, 26 Feb 2024 18:59:03 GMT
- Title: MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
- Authors: Omkar Thawakar, Ashmal Vayani, Salman Khan, Hisham Cholakal, Rao M.
Anwer, Michael Felsberg, Tim Baldwin, Eric P. Xing, Fahad Shahbaz Khan
- Abstract summary: "Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development.
This paper explores the "less is more" paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices.
- Score: 87.4910758026772
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: "Bigger the better" has been the predominant trend in recent Large Language
Models (LLMs) development. However, LLMs do not suit well for scenarios that
require on-device processing, energy efficiency, low memory footprint, and
response efficiency. These requisites are crucial for privacy, security, and
sustainable deployment. This paper explores the "less is more" paradigm by
addressing the challenge of designing accurate yet efficient Small Language
Models (SLMs) for resource constrained devices. Our primary contribution is the
introduction of an accurate and fully transparent open-source 0.5 billion
(0.5B) parameter SLM, named MobiLlama, catering to the specific needs of
resource-constrained computing with an emphasis on enhanced performance with
reduced resource demands. MobiLlama is a SLM design that initiates from a
larger model and applies a careful parameter sharing scheme to reduce both the
pre-training and the deployment cost. Our work strives to not only bridge the
gap in open-source SLMs but also ensures full transparency, where complete
training data pipeline, training code, model weights, and over 300 checkpoints
along with evaluation codes is available at :
https://github.com/mbzuai-oryx/MobiLlama.
Related papers
- Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent [15.463595798992621]
Large language models (LLMs) have revolutionized the deep learning paradigm, yielding impressive results across a wide array of tasks.
Existing solutions make the unrealistic assumption that the entire model is exchanged for training.
We introduce a novel method for the efficient training and fine-tuning of LLMs in FL, with minimal resource consumption.
arXiv Detail & Related papers (2024-06-17T03:49:44Z) - REQUAL-LM: Reliability and Equity through Aggregation in Large Language Models [10.684722193666607]
We introduce REQUAL-LM, a novel method for finding reliable and equitable large language models (LLMs) outputs through aggregation.
Specifically, we develop a Monte Carlo method based on repeated sampling to find a reliable output close to the mean of the underlying distribution of possible outputs.
We formally define the terms such as reliability and bias, and design an equity-aware aggregation to minimize harmful bias while finding a highly reliable output.
arXiv Detail & Related papers (2024-04-17T22:12:41Z) - Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models [21.864109456867784]
For many downstream tasks, it is necessary to fine-tune large language models (LLMs) using private data.
We propose an automated federated pipeline, named FedPipe, to fine-tune LLMs with minimal training cost.
Extensive experiments demonstrate that FedPipe expedites the model training and achieves higher accuracy than state-of-the-art benchmarks.
arXiv Detail & Related papers (2024-04-09T16:50:30Z) - BiLLM: Pushing the Limit of Post-Training Quantization for LLMs [53.31402059062365]
BiLLM is a groundbreaking 1-bit post-training quantization scheme tailored for pretrained large language models.
It achieves for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families.
arXiv Detail & Related papers (2024-02-06T09:26:34Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z) - Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs
for Embodied AI [10.82017289243097]
Large Language Models (LLMs) are capable of reasoning over diverse input data modalities through pre-trained encoders.
m-LLM improves the task accuracy by up to 4% compared to the best existing scheme.
arXiv Detail & Related papers (2023-12-13T04:08:59Z) - Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs [67.38165028487242]
We introduce Dynamic Sparse No Training (DSnoT), a training-free fine-tuning approach to fine-tune large language models (LLMs)
Inspired by the Dynamic Sparse Training, DSnoT minimizes the reconstruction error between the dense and sparse LLMs.
Our paper offers fresh insights into how to fine-tune sparse LLMs in an efficient training-free manner and open new venues to scale the great potential of sparsity to LLMs.
arXiv Detail & Related papers (2023-10-13T07:38:52Z) - Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly [62.473245910234304]
This paper takes a hardware-centric approach to explore how Large Language Models can be brought to modern edge computing systems.
We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions.
arXiv Detail & Related papers (2023-10-04T20:27:20Z) - FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large
Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution.
We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios.
We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.