Related papers: Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT LLM on Mobile

Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT LLM on Mobile

URL: http://arxiv.org/abs/2310.01434v1
Date: Fri, 29 Sep 2023 16:30:49 GMT
Title: Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT LLM on Mobile
Authors: Samuel Carreira, Tom\'as Marques, Jos\'e Ribeiro, Carlos Grilo
Abstract summary: This article presents an innovative approach to LLM inference, envisioning a future where LLMs with billions of parameters can be executed directly on mobile devices without network connectivity. The article showcases a fine-tuned GPT LLM with 3 billion parameters that can operate smoothly on devices with as low as 4GB of memory. Through the integration of native code and model quantization techniques, the application not only serves as a general-purpose assistant but also facilitates seamless mobile interactions with text-to-actions features.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The field of Artificial Intelligence has witnessed remarkable progress in recent years, especially with the emergence of powerful large language models (LLMs) based on the transformer architecture. Cloud-based LLMs, such as OpenAI's ChatGPT, offer impressive capabilities but come with concerns regarding latency and privacy due to network dependencies. This article presents an innovative approach to LLM inference, envisioning a future where LLMs with billions of parameters can be executed directly on mobile devices without network connectivity. The article showcases a fine-tuned GPT LLM with 3 billion parameters that can operate smoothly on devices with as low as 4GB of memory. Through the integration of native code and model quantization techniques, the application not only serves as a general-purpose assistant but also facilitates seamless mobile interactions with text-to-actions features. The article provides insights into the training pipeline, implementation details, test results, and future directions of on-device LLM inference. This breakthrough technology opens up possibilities for empowering users with sophisticated AI capabilities while preserving their privacy and eliminating latency concerns.

Related papers

Distilling On-device Language Models for Robot Planning with Minimal Human Intervention [117.90128579811014]
PRISM is a framework for distilling small language model (SLM)-enabled robot planners.<n>We apply PRISM to three LLM-enabled planners for mapping and exploration, manipulation, and household assistance.<n>We demonstrate that PRISM improves the performance of Llama-3.2-3B from 10-20% of GPT-4o's performance to over 93%.
arXiv Detail & Related papers (2025-06-20T21:44:27Z)
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing [48.30406812516552]
We introduce the PLM, a Peripheral Language Model, developed through a co-design process that jointly optimize model architecture and edge system constraints. PLM employs a Multi-head Latent Attention mechanism and employs the squared ReLU activation function to encourage sparsity, thereby reducing peak memory footprint. evaluation results demonstrate that PLM outperforms existing small language models trained on publicly available data.
arXiv Detail & Related papers (2025-03-15T15:11:17Z)
Are We There Yet? A Measurement Study of Efficiency for LLM Applications on Mobile Devices [5.926813659185372]
Small-size large language models (LLMs) can run successfully on powerful mobile devices, though they exhibit quality limitations compared to larger models. Only small-size LLMs can run successfully on powerful mobile devices, though they exhibit quality limitations compared to larger models.
arXiv Detail & Related papers (2025-03-10T16:27:17Z)
LLMs in Mobile Apps: Practices, Challenges, and Opportunities [4.104646810514711]
The integration of AI techniques has become increasingly popular in software development. With the rise of large language models (LLMs) and generative AI, developers now have access to a wealth of high-quality open-source models and APIs from closed-source providers.
arXiv Detail & Related papers (2025-02-21T19:53:43Z)
MiniCPM-V: A GPT-4V Level MLLM on Your Phone [83.10007643273521]
MiniCPM-V is a series of efficient MLLMs deployable on end-side devices. By integrating the latest MLLM techniques in architecture, pretraining and alignment, MiniCPM-V 2.5 has several notable features. MiniCPM-V can be viewed as a representative example of a promising trend.
arXiv Detail & Related papers (2024-08-03T15:02:21Z)
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey [32.22789677882933]
Mobile edge intelligence (MEI) provides AI capabilities within the edge of mobile networks with improved privacy and latency relative to cloud computing. MEI sits between on-device AI and cloud-based AI, featuring wireless communications and more powerful computing resources than end devices. This article provides a contemporary survey on harnessing MEI for LLMs.
arXiv Detail & Related papers (2024-07-09T13:47:05Z)
Generative AI-in-the-loop: Integrating LLMs and GPTs into the Next Generation Networks [11.509880721677156]
Large language models (LLMs) have recently emerged, demonstrating near-human-level performance in cognitive tasks. We propose the concept of "generative AI-in-the-loop" We believe that combining LLMs and ML models allows both to leverage their respective capabilities and achieve better results than either model alone.
arXiv Detail & Related papers (2024-06-06T17:25:07Z)
Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities [36.711166825551715]
Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities. This work aims to provide a comprehensive overview of LLM-enabled telecom networks.
arXiv Detail & Related papers (2024-05-17T14:46:13Z)
Using Large Language Models to Understand Telecom Standards [35.343893798039765]
Large Language Models (LLMs) may provide faster access to relevant information. We evaluate the capability of state-of-art LLMs to be used as Question Answering (QA) assistants. Results show that LLMs can be used as a credible reference tool on telecom technical documents.
arXiv Detail & Related papers (2024-04-02T09:54:51Z)
When Large Language Model Agents Meet 6G Networks: Perception, Grounding, and Alignment [100.58938424441027]
We propose a split learning system for AI agents in 6G networks leveraging the collaboration between mobile devices and edge servers. We introduce a novel model caching algorithm for LLMs within the proposed system to improve model utilization in context.
arXiv Detail & Related papers (2024-01-15T15:20:59Z)
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN) At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z)
Video Understanding with Large Language Models: A Survey [97.29126722004949]
Given the remarkable capabilities of large language models (LLMs) in language and multimodal tasks, this survey provides a detailed overview of recent advancements in video understanding. The emergent capabilities Vid-LLMs are surprisingly advanced, particularly their ability for open-ended multi-granularity reasoning. This survey presents a comprehensive study of the tasks, datasets, benchmarks, and evaluation methodologies for Vid-LLMs.
arXiv Detail & Related papers (2023-12-29T01:56:17Z)
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. FedKSeed employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z)
Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training [18.526329975259483]
Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. It is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. We propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art LLMs on commodity mobile devices.
arXiv Detail & Related papers (2023-11-22T13:20:59Z)
Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly [62.473245910234304]
This paper takes a hardware-centric approach to explore how Large Language Models can be brought to modern edge computing systems. We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions.
arXiv Detail & Related papers (2023-10-04T20:27:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.