MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
- URL: http://arxiv.org/abs/2402.16840v1
- Date: Mon, 26 Feb 2024 18:59:03 GMT
- Title: MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
- Authors: Omkar Thawakar, Ashmal Vayani, Salman Khan, Hisham Cholakal, Rao M.
Anwer, Michael Felsberg, Tim Baldwin, Eric P. Xing, Fahad Shahbaz Khan
- Abstract summary: "Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development.
This paper explores the "less is more" paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices.
- Score: 87.4910758026772
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: "Bigger the better" has been the predominant trend in recent Large Language
Models (LLMs) development. However, LLMs do not suit well for scenarios that
require on-device processing, energy efficiency, low memory footprint, and
response efficiency. These requisites are crucial for privacy, security, and
sustainable deployment. This paper explores the "less is more" paradigm by
addressing the challenge of designing accurate yet efficient Small Language
Models (SLMs) for resource constrained devices. Our primary contribution is the
introduction of an accurate and fully transparent open-source 0.5 billion
(0.5B) parameter SLM, named MobiLlama, catering to the specific needs of
resource-constrained computing with an emphasis on enhanced performance with
reduced resource demands. MobiLlama is a SLM design that initiates from a
larger model and applies a careful parameter sharing scheme to reduce both the
pre-training and the deployment cost. Our work strives to not only bridge the
gap in open-source SLMs but also ensures full transparency, where complete
training data pipeline, training code, model weights, and over 300 checkpoints
along with evaluation codes is available at :
https://github.com/mbzuai-oryx/MobiLlama.
Related papers
- CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration [1.6021932740447968]
Large Language Models (LLMs) have achieved remarkable success in serving end-users with human-like intelligence.
LLMs demand high computational resources, making it challenging to deploy them to satisfy various performance objectives.
We introduce CE-CoLLM, a novel cloud-edge collaboration framework that supports efficient and adaptive LLM inference for end-users at the edge.
arXiv Detail & Related papers (2024-11-05T06:00:27Z) - Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models.
Our approach employs activation sparsity to extract experts.
Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z) - A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs [74.35290684163718]
A primary challenge in large language model (LLM) development is their onerous pre-training cost.
This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by leveraging a small language model (SLM)
arXiv Detail & Related papers (2024-10-24T14:31:52Z) - LMGT: Optimizing Exploration-Exploitation Balance in Reinforcement Learning through Language Model Guided Trade-offs [27.014415210732103]
We introduce textbfLanguage textbfModel textbfGuided textbfTrade-offs (i.e., textbfLMGT), a novel, sample-efficient framework for Reinforcement Learning.
arXiv Detail & Related papers (2024-09-07T07:40:43Z) - Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent [15.463595798992621]
Large language models (LLMs) have revolutionized the deep learning paradigm, yielding impressive results across a wide array of tasks.
Existing solutions make the unrealistic assumption that the entire model is exchanged for training.
We introduce a novel method for the efficient training and fine-tuning of LLMs in FL, with minimal resource consumption.
arXiv Detail & Related papers (2024-06-17T03:49:44Z) - BiLLM: Pushing the Limit of Post-Training Quantization for LLMs [53.31402059062365]
BiLLM is a groundbreaking 1-bit post-training quantization scheme tailored for pretrained large language models.
It achieves for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families.
arXiv Detail & Related papers (2024-02-06T09:26:34Z) - Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs
for Embodied AI [10.82017289243097]
Large Language Models (LLMs) are capable of reasoning over diverse input data modalities through pre-trained encoders.
m-LLM improves the task accuracy by up to 4% compared to the best existing scheme.
arXiv Detail & Related papers (2023-12-13T04:08:59Z) - Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly [62.473245910234304]
This paper takes a hardware-centric approach to explore how Large Language Models can be brought to modern edge computing systems.
We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions.
arXiv Detail & Related papers (2023-10-04T20:27:20Z) - FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large
Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution.
We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios.
We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.