Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT
LLM on Mobile
- URL: http://arxiv.org/abs/2310.01434v1
- Date: Fri, 29 Sep 2023 16:30:49 GMT
- Title: Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT
LLM on Mobile
- Authors: Samuel Carreira, Tom\'as Marques, Jos\'e Ribeiro, Carlos Grilo
- Abstract summary: This article presents an innovative approach to LLM inference, envisioning a future where LLMs with billions of parameters can be executed directly on mobile devices without network connectivity.
The article showcases a fine-tuned GPT LLM with 3 billion parameters that can operate smoothly on devices with as low as 4GB of memory.
Through the integration of native code and model quantization techniques, the application not only serves as a general-purpose assistant but also facilitates seamless mobile interactions with text-to-actions features.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The field of Artificial Intelligence has witnessed remarkable progress in
recent years, especially with the emergence of powerful large language models
(LLMs) based on the transformer architecture. Cloud-based LLMs, such as
OpenAI's ChatGPT, offer impressive capabilities but come with concerns
regarding latency and privacy due to network dependencies. This article
presents an innovative approach to LLM inference, envisioning a future where
LLMs with billions of parameters can be executed directly on mobile devices
without network connectivity. The article showcases a fine-tuned GPT LLM with 3
billion parameters that can operate smoothly on devices with as low as 4GB of
memory. Through the integration of native code and model quantization
techniques, the application not only serves as a general-purpose assistant but
also facilitates seamless mobile interactions with text-to-actions features.
The article provides insights into the training pipeline, implementation
details, test results, and future directions of on-device LLM inference. This
breakthrough technology opens up possibilities for empowering users with
sophisticated AI capabilities while preserving their privacy and eliminating
latency concerns.
Related papers
- MiniCPM-V: A GPT-4V Level MLLM on Your Phone [83.10007643273521]
MiniCPM-V is a series of efficient MLLMs deployable on end-side devices.
By integrating the latest MLLM techniques in architecture, pretraining and alignment, MiniCPM-V 2.5 has several notable features.
MiniCPM-V can be viewed as a representative example of a promising trend.
arXiv Detail & Related papers (2024-08-03T15:02:21Z) - Mobile Edge Intelligence for Large Language Models: A Contemporary Survey [32.22789677882933]
Mobile edge intelligence (MEI) provides AI capabilities within the edge of mobile networks with improved privacy and latency relative to cloud computing.
MEI sits between on-device AI and cloud-based AI, featuring wireless communications and more powerful computing resources than end devices.
This article provides a contemporary survey on harnessing MEI for LLMs.
arXiv Detail & Related papers (2024-07-09T13:47:05Z) - Generative AI-in-the-loop: Integrating LLMs and GPTs into the Next Generation Networks [11.509880721677156]
Large language models (LLMs) have recently emerged, demonstrating near-human-level performance in cognitive tasks.
We propose the concept of "generative AI-in-the-loop"
We believe that combining LLMs and ML models allows both to leverage their respective capabilities and achieve better results than either model alone.
arXiv Detail & Related papers (2024-06-06T17:25:07Z) - Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities [36.711166825551715]
Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities.
This work aims to provide a comprehensive overview of LLM-enabled telecom networks.
arXiv Detail & Related papers (2024-05-17T14:46:13Z) - Using Large Language Models to Understand Telecom Standards [35.343893798039765]
Large Language Models (LLMs) may provide faster access to relevant information.
We evaluate the capability of state-of-art LLMs to be used as Question Answering (QA) assistants.
Results show that LLMs can be used as a credible reference tool on telecom technical documents.
arXiv Detail & Related papers (2024-04-02T09:54:51Z) - When Large Language Model Agents Meet 6G Networks: Perception,
Grounding, and Alignment [100.58938424441027]
We propose a split learning system for AI agents in 6G networks leveraging the collaboration between mobile devices and edge servers.
We introduce a novel model caching algorithm for LLMs within the proposed system to improve model utilization in context.
arXiv Detail & Related papers (2024-01-15T15:20:59Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z) - Video Understanding with Large Language Models: A Survey [97.29126722004949]
Given the remarkable capabilities of large language models (LLMs) in language and multimodal tasks, this survey provides a detailed overview of recent advancements in video understanding.
The emergent capabilities Vid-LLMs are surprisingly advanced, particularly their ability for open-ended multi-granularity reasoning.
This survey presents a comprehensive study of the tasks, datasets, benchmarks, and evaluation methodologies for Vid-LLMs.
arXiv Detail & Related papers (2023-12-29T01:56:17Z) - Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions.
FedKSeed employs zeroth-order optimization with a finite set of random seeds.
It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z) - Confidant: Customizing Transformer-based LLMs via Collaborative Edge
Training [18.526329975259483]
Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks.
It is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets.
We propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art LLMs on commodity mobile devices.
arXiv Detail & Related papers (2023-11-22T13:20:59Z) - Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly [62.473245910234304]
This paper takes a hardware-centric approach to explore how Large Language Models can be brought to modern edge computing systems.
We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions.
arXiv Detail & Related papers (2023-10-04T20:27:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.