Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT
LLM on Mobile
- URL: http://arxiv.org/abs/2310.01434v1
- Date: Fri, 29 Sep 2023 16:30:49 GMT
- Title: Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT
LLM on Mobile
- Authors: Samuel Carreira, Tom\'as Marques, Jos\'e Ribeiro, Carlos Grilo
- Abstract summary: This article presents an innovative approach to LLM inference, envisioning a future where LLMs with billions of parameters can be executed directly on mobile devices without network connectivity.
The article showcases a fine-tuned GPT LLM with 3 billion parameters that can operate smoothly on devices with as low as 4GB of memory.
Through the integration of native code and model quantization techniques, the application not only serves as a general-purpose assistant but also facilitates seamless mobile interactions with text-to-actions features.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The field of Artificial Intelligence has witnessed remarkable progress in
recent years, especially with the emergence of powerful large language models
(LLMs) based on the transformer architecture. Cloud-based LLMs, such as
OpenAI's ChatGPT, offer impressive capabilities but come with concerns
regarding latency and privacy due to network dependencies. This article
presents an innovative approach to LLM inference, envisioning a future where
LLMs with billions of parameters can be executed directly on mobile devices
without network connectivity. The article showcases a fine-tuned GPT LLM with 3
billion parameters that can operate smoothly on devices with as low as 4GB of
memory. Through the integration of native code and model quantization
techniques, the application not only serves as a general-purpose assistant but
also facilitates seamless mobile interactions with text-to-actions features.
The article provides insights into the training pipeline, implementation
details, test results, and future directions of on-device LLM inference. This
breakthrough technology opens up possibilities for empowering users with
sophisticated AI capabilities while preserving their privacy and eliminating
latency concerns.
Related papers
- Generative AI-in-the-loop: Integrating LLMs and GPTs into the Next Generation Networks [11.509880721677156]
Large language models (LLMs) have recently emerged, demonstrating near-human-level performance in cognitive tasks.
We propose the concept of "generative AI-in-the-loop"
We believe that combining LLMs and ML models allows both to leverage their respective capabilities and achieve better results than either model alone.
arXiv Detail & Related papers (2024-06-06T17:25:07Z) - Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities [36.711166825551715]
Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities.
This work aims to provide a comprehensive overview of LLM-enabled telecom networks.
arXiv Detail & Related papers (2024-05-17T14:46:13Z) - Using Large Language Models to Understand Telecom Standards [35.343893798039765]
Large Language Models (LLMs) may provide faster access to relevant information.
We evaluate the capability of state-of-art LLMs to be used as Question Answering (QA) assistants.
Results show that LLMs can be used as a credible reference tool on telecom technical documents.
arXiv Detail & Related papers (2024-04-02T09:54:51Z) - Unmemorization in Large Language Models via Self-Distillation and
Deliberate Imagination [58.36408867180233]
Large Language Models (LLMs) struggle with crucial issues of privacy violation and unwanted exposure of sensitive data.
We introduce a novel approach termed deliberate imagination in the context of LLM unlearning.
Our results demonstrate the usefulness of this approach across different models and sizes, and also with parameter-efficient fine-tuning.
arXiv Detail & Related papers (2024-02-15T16:21:14Z) - When Large Language Model Agents Meet 6G Networks: Perception,
Grounding, and Alignment [100.58938424441027]
We propose a split learning system for AI agents in 6G networks leveraging the collaboration between mobile devices and edge servers.
We introduce a novel model caching algorithm for LLMs within the proposed system to improve model utilization in context.
arXiv Detail & Related papers (2024-01-15T15:20:59Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z) - Video Understanding with Large Language Models: A Survey [97.29126722004949]
Given the remarkable capabilities of large language models (LLMs) in language and multimodal tasks, this survey provides a detailed overview of recent advancements in video understanding.
The emergent capabilities Vid-LLMs are surprisingly advanced, particularly their ability for open-ended multi-granularity reasoning.
This survey presents a comprehensive study of the tasks, datasets, benchmarks, and evaluation methodologies for Vid-LLMs.
arXiv Detail & Related papers (2023-12-29T01:56:17Z) - Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions.
FedKSeed employs zeroth-order optimization with a finite set of random seeds.
It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z) - Confidant: Customizing Transformer-based LLMs via Collaborative Edge
Training [18.526329975259483]
Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks.
It is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets.
We propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art LLMs on commodity mobile devices.
arXiv Detail & Related papers (2023-11-22T13:20:59Z) - Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly [62.473245910234304]
This paper takes a hardware-centric approach to explore how Large Language Models can be brought to modern edge computing systems.
We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions.
arXiv Detail & Related papers (2023-10-04T20:27:20Z) - Pushing Large Language Models to the 6G Edge: Vision, Challenges, and
Opportunities [32.035405009895264]
Large language models (LLMs) are revolutionizing AI development and potentially shaping our future.
The status quo cloud-based deployment faces some critical challenges: 1) long response time; 2) high bandwidth costs; and 3) the violation of data privacy.
6G mobile edge computing (MEC) systems may resolve these pressing issues.
This article serves as a position paper for thoroughly identifying the motivation, challenges, and pathway for empowering LLMs at the 6G edge.
arXiv Detail & Related papers (2023-09-28T06:22:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.