Related papers: Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation

Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation

URL: http://arxiv.org/abs/2410.03613v1
Date: Fri, 4 Oct 2024 17:14:59 GMT
Title: Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation
Authors: Jie Xiao, Qianyi Huang, Xu Chen, Chen Tian,
Abstract summary: Large language models (LLMs) increasingly integrate into every aspect of our work and daily lives. There are growing concerns about user privacy, which push the trend toward local deployment of these models. As a rapidly emerging application, we are concerned about their performance on commercial-off-the-shelf mobile devices.
Score: 10.817783356090027
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As large language models (LLMs) increasingly integrate into every aspect of our work and daily lives, there are growing concerns about user privacy, which push the trend toward local deployment of these models. There are a number of lightweight LLMs (e.g., Gemini Nano, LLAMA2 7B) that can run locally on smartphones, providing users with greater control over their personal data. As a rapidly emerging application, we are concerned about their performance on commercial-off-the-shelf mobile devices. To fully understand the current landscape of LLM deployment on mobile platforms, we conduct a comprehensive measurement study on mobile devices. We evaluate both metrics that affect user experience, including token throughput, latency, and battery consumption, as well as factors critical to developers, such as resource utilization, DVFS strategies, and inference engines. In addition, we provide a detailed analysis of how these hardware capabilities and system dynamics affect on-device LLM performance, which may help developers identify and address bottlenecks for mobile LLM applications. We also provide comprehensive comparisons across the mobile system-on-chips (SoCs) from major vendors, highlighting their performance differences in handling LLM workloads. We hope that this study can provide insights for both the development of on-device LLMs and the design for future mobile system architecture.

Related papers

Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap [51.198001060683296]
Large Language Models (LLMs) offer transformative potential to address transportation challenges. This survey first presents LLM4TR, a novel conceptual framework that systematically categorizes the roles of LLMs in transportation. For each role, our review spans diverse applications, from traffic prediction and autonomous driving to safety analytics and urban mobility optimization.
arXiv Detail & Related papers (2025-03-27T11:56:27Z)
Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark [45.28023118459497]
We introduce Mobile-MMLU, a large-scale benchmark dataset tailored for mobile intelligence. It consists of 16,186 questions across 80 mobile-related fields, designed to evaluate LLM performance in realistic mobile scenarios. A challenging subset, Mobile-MMLU-Pro, provides advanced evaluation similar in size to MMLU-Pro but significantly more difficult than our standard full set.
arXiv Detail & Related papers (2025-03-26T17:59:56Z)
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing [48.30406812516552]
We introduce the PLM, a Peripheral Language Model, developed through a co-design process that jointly optimize model architecture and edge system constraints. PLM employs a Multi-head Latent Attention mechanism and employs the squared ReLU activation function to encourage sparsity, thereby reducing peak memory footprint. evaluation results demonstrate that PLM outperforms existing small language models trained on publicly available data.
arXiv Detail & Related papers (2025-03-15T15:11:17Z)
Are We There Yet? A Measurement Study of Efficiency for LLM Applications on Mobile Devices [5.926813659185372]
Small-size large language models (LLMs) can run successfully on powerful mobile devices, though they exhibit quality limitations compared to larger models. Only small-size LLMs can run successfully on powerful mobile devices, though they exhibit quality limitations compared to larger models.
arXiv Detail & Related papers (2025-03-10T16:27:17Z)
LLMs in Mobile Apps: Practices, Challenges, and Opportunities [4.104646810514711]
The integration of AI techniques has become increasingly popular in software development. With the rise of large language models (LLMs) and generative AI, developers now have access to a wealth of high-quality open-source models and APIs from closed-source providers.
arXiv Detail & Related papers (2025-02-21T19:53:43Z)
SlimLM: An Efficient Small Language Model for On-Device Document Assistance [60.971107009492606]
We present SlimLM, a series of SLMs optimized for document assistance tasks on mobile devices. SlimLM is pre-trained on SlimPajama-627B and fine-tuned on DocAssist. We evaluate SlimLM against existing SLMs, showing comparable or superior performance.
arXiv Detail & Related papers (2024-11-15T04:44:34Z)
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance [78.48606021719206]
Mini-InternVL is a series of MLLMs with parameters ranging from 1B to 4B, which achieves 90% of the performance with only 5% of the parameters. We develop a unified adaptation framework for Mini-InternVL, which enables our models to transfer and outperform specialized models in downstream tasks.
arXiv Detail & Related papers (2024-10-21T17:58:20Z)
PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms [11.87161637895978]
We introduce our lightweight, all-in-one automated benchmarking framework that allows users to evaluate large language models on mobile devices. We provide a benchmark of various popular LLMs with different quantization configurations (both weights and activations) across multiple mobile platforms with varying hardware capabilities.
arXiv Detail & Related papers (2024-10-05T03:37:07Z)
On-Device Language Models: A Comprehensive Review [26.759861320845467]
Review examines the challenges of deploying computationally expensive large language models on resource-constrained devices. Paper investigates on-device language models, their efficient architectures, as well as state-of-the-art compression techniques. Case studies of on-device language models from major mobile manufacturers demonstrate real-world applications and potential benefits.
arXiv Detail & Related papers (2024-08-26T03:33:36Z)
Performance Law of Large Language Models [58.32539851241063]
Performance law can be used to guide the choice of LLM architecture and the effective allocation of computational resources. Performance law can be used to guide the choice of LLM architecture and the effective allocation of computational resources without extensive experiments.
arXiv Detail & Related papers (2024-08-19T11:09:12Z)
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective [53.48484062444108]
We find that the development of models and data is not two separate paths but rather interconnected. On the one hand, vaster and higher-quality data contribute to better performance of MLLMs; on the other hand, MLLMs can facilitate the development of data. To promote the data-model co-development for MLLM community, we systematically review existing works related to MLLMs from the data-model co-development perspective.
arXiv Detail & Related papers (2024-07-11T15:08:11Z)
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases [81.70591346986582]
We introduce MobileAIBench, a benchmarking framework for evaluating Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices. MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices.
arXiv Detail & Related papers (2024-06-12T22:58:12Z)
Demystifying Platform Requirements for Diverse LLM Inference Use Cases [7.233203254714951]
We present an analytical tool, GenZ, to study the relationship between large language models inference performance and various platform design parameters. We quantify the platform requirements to support SOTA LLMs models like LLaMA and GPT-4 under diverse serving settings. Ultimately, this work sheds light on the platform design considerations for unlocking the full potential of large language models across a spectrum of applications.
arXiv Detail & Related papers (2024-06-03T18:00:50Z)
MELTing point: Mobile Evaluation of Language Transformers [8.238355633015068]
We explore the current state of mobile execution of Large Language Models (LLMs) We have created our own automation infrastructure, MELT, which supports the headless execution and benchmarking of LLMs on device. We evaluate popular instruction fine-tuned LLMs and leverage different frameworks to measure their end-to-end and granular performance.
arXiv Detail & Related papers (2024-03-19T15:51:21Z)
Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly [62.473245910234304]
This paper takes a hardware-centric approach to explore how Large Language Models can be brought to modern edge computing systems. We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions.
arXiv Detail & Related papers (2023-10-04T20:27:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.