Related papers: MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

URL: http://arxiv.org/abs/2406.10290v1
Date: Wed, 12 Jun 2024 22:58:12 GMT
Title: MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
Authors: Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese,
Abstract summary: We introduce MobileAIBench, a benchmarking framework for evaluating Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices. MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices.
Score: 81.70591346986582
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understanding of quantization's impact on various task performances, including LLM tasks, LMM tasks, and, critically, trust and safety. There is a lack of adequate tools for systematically testing these models on mobile devices. To address these gaps, we introduce MobileAIBench, a comprehensive benchmarking framework for evaluating mobile-optimized LLMs and LMMs. MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices. Our two-part open-source framework includes a library for running evaluations on desktops and an iOS app for on-device latency and hardware utilization measurements. Our thorough analysis aims to accelerate mobile AI research and deployment by providing insights into the performance and feasibility of deploying LLMs and LMMs on mobile platforms.

Related papers

Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark [45.28023118459497]
We introduce Mobile-MMLU, a large-scale benchmark dataset tailored for mobile intelligence. It consists of 16,186 questions across 80 mobile-related fields, designed to evaluate LLM performance in realistic mobile scenarios. A challenging subset, Mobile-MMLU-Pro, provides advanced evaluation similar in size to MMLU-Pro but significantly more difficult than our standard full set.
arXiv Detail & Related papers (2025-03-26T17:59:56Z)
Are We There Yet? A Measurement Study of Efficiency for LLM Applications on Mobile Devices [5.926813659185372]
Small-size large language models (LLMs) can run successfully on powerful mobile devices, though they exhibit quality limitations compared to larger models. Only small-size LLMs can run successfully on powerful mobile devices, though they exhibit quality limitations compared to larger models.
arXiv Detail & Related papers (2025-03-10T16:27:17Z)
Benchmarking Large and Small MLLMs [71.78055760441256]
Large multimodal language models (MLLMs) have achieved remarkable advancements in understanding and generating multimodal content. However, their deployment faces significant challenges, including slow inference, high computational cost, and impracticality for on-device applications. Small MLLMs, exemplified by the LLava-series models and Phi-3-Vision, offer promising alternatives with faster inference, reduced deployment costs, and the ability to handle domain-specific scenarios.
arXiv Detail & Related papers (2025-01-04T07:44:49Z)
SlimLM: An Efficient Small Language Model for On-Device Document Assistance [60.971107009492606]
We present SlimLM, a series of SLMs optimized for document assistance tasks on mobile devices. SlimLM is pre-trained on SlimPajama-627B and fine-tuned on DocAssist. We evaluate SlimLM against existing SLMs, showing comparable or superior performance.
arXiv Detail & Related papers (2024-11-15T04:44:34Z)
A Survey of Small Language Models [104.80308007044634]
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources. We present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques.
arXiv Detail & Related papers (2024-10-25T23:52:28Z)
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance [78.48606021719206]
Mini-InternVL is a series of MLLMs with parameters ranging from 1B to 4B, which achieves 90% of the performance with only 5% of the parameters. We develop a unified adaptation framework for Mini-InternVL, which enables our models to transfer and outperform specialized models in downstream tasks.
arXiv Detail & Related papers (2024-10-21T17:58:20Z)
PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms [11.87161637895978]
We introduce our lightweight, all-in-one automated benchmarking framework that allows users to evaluate large language models on mobile devices. We provide a benchmark of various popular LLMs with different quantization configurations (both weights and activations) across multiple mobile platforms with varying hardware capabilities.
arXiv Detail & Related papers (2024-10-05T03:37:07Z)
Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation [10.817783356090027]
Large language models (LLMs) increasingly integrate into every aspect of our work and daily lives. There are growing concerns about user privacy, which push the trend toward local deployment of these models. As a rapidly emerging application, we are concerned about their performance on commercial-off-the-shelf mobile devices.
arXiv Detail & Related papers (2024-10-04T17:14:59Z)
On-Device Language Models: A Comprehensive Review [26.759861320845467]
Review examines the challenges of deploying computationally expensive large language models on resource-constrained devices. Paper investigates on-device language models, their efficient architectures, as well as state-of-the-art compression techniques. Case studies of on-device language models from major mobile manufacturers demonstrate real-world applications and potential benefits.
arXiv Detail & Related papers (2024-08-26T03:33:36Z)
LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit [55.73370804397226]
Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating large language models. We present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization. Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats.
arXiv Detail & Related papers (2024-05-09T11:49:05Z)
LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model. This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z)
A Performance Evaluation of a Quantized Large Language Model on Various Smartphones [0.0]
This paper explores the feasibility and performance of on-device large language model (LLM) inference on various Apple iPhone models. Leveraging existing literature on running multi-billion parameter LLMs on resource-limited devices, our study examines the thermal effects and interaction speeds of a high-performing LLM. We present real-world performance results, providing insights into on-device inference capabilities.
arXiv Detail & Related papers (2023-12-19T10:19:39Z)
MLPerf Mobile Inference Benchmark [11.883357894242668]
erferf Mobile is the first industry-standard open-source mobile benchmark developed by industry members and academic researchers. For the first, we developed an app to provide an "out-of-the-box" inference-performance benchmark for computer vision and natural-language processing on mobile devices.
arXiv Detail & Related papers (2020-12-03T23:29:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.