MLPerf Mobile Inference Benchmark
- URL: http://arxiv.org/abs/2012.02328v2
- Date: Fri, 26 Feb 2021 14:34:51 GMT
- Title: MLPerf Mobile Inference Benchmark
- Authors: Vijay Janapa Reddi, David Kanter, Peter Mattson, Jared Duke, Thai
Nguyen, Ramesh Chukka, Kenneth Shiring, Koan-Sin Tan, Mark Charlebois,
William Chou, Mostafa El-Khamy, Jungwook Hong, Michael Buch, Cindy Trinh,
Thomas Atta-fosu, Fatih Cakir, Masoud Charkhabi, Xiaodong Chen, Jimmy Chiang,
Dave Dexter, Woncheol Heo, Guenther Schmuelling, Maryam Shabani, Dylan Zika
- Abstract summary: erferf Mobile is the first industry-standard open-source mobile benchmark developed by industry members and academic researchers.
For the first, we developed an app to provide an "out-of-the-box" inference-performance benchmark for computer vision and natural-language processing on mobile devices.
- Score: 11.883357894242668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: MLPerf Mobile is the first industry-standard open-source mobile benchmark
developed by industry members and academic researchers to allow
performance/accuracy evaluation of mobile devices with different AI chips and
software stacks. The benchmark draws from the expertise of leading mobile-SoC
vendors, ML-framework providers, and model producers. In this paper, we
motivate the drive to demystify mobile-AI performance and present MLPerf
Mobile's design considerations, architecture, and implementation. The benchmark
comprises a suite of models that operate under standard models, data sets,
quality metrics, and run rules. For the first iteration, we developed an app to
provide an "out-of-the-box" inference-performance benchmark for computer vision
and natural-language processing on mobile devices. MLPerf Mobile can serve as a
framework for integrating future models, for customizing quality-target
thresholds to evaluate system performance, for comparing software frameworks,
and for assessing heterogeneous-hardware capabilities for machine learning, all
fairly and faithfully with fully reproducible results.
Related papers
- Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment.
To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z) - MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases [81.70591346986582]
We introduce MobileAIBench, a benchmarking framework for evaluating Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices.
MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices.
arXiv Detail & Related papers (2024-06-12T22:58:12Z) - Benchmarking Mobile Device Control Agents across Diverse Configurations [21.164023091324523]
B-MoCA is a novel benchmark for evaluating mobile device control agents.
We benchmark diverse agents, including agents employing large language models (LLMs) or multi-modal LLMs as well as agents trained from scratch using human expert demonstrations.
arXiv Detail & Related papers (2024-04-25T14:56:32Z) - MELTing point: Mobile Evaluation of Language Transformers [8.238355633015068]
We explore the current state of mobile execution of Large Language Models (LLMs)
We have created our own automation infrastructure, MELT, which supports the headless execution and benchmarking of LLMs on device.
We evaluate popular instruction fine-tuned LLMs and leverage different frameworks to measure their end-to-end and granular performance.
arXiv Detail & Related papers (2024-03-19T15:51:21Z) - Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception [52.5831204440714]
We introduce Mobile-Agent, an autonomous multi-modal mobile device agent.
Mobile-Agent first leverages visual perception tools to accurately identify and locate both the visual and textual elements within the app's front-end interface.
It then autonomously plans and decomposes the complex operation task, and navigates the mobile Apps through operations step by step.
arXiv Detail & Related papers (2024-01-29T13:46:37Z) - Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined
Levels [95.44077384918725]
We propose to teach large multi-modality models (LMMs) with text-defined rating levels instead of scores.
The proposed Q-Align achieves state-of-the-art performance on image quality assessment (IQA), image aesthetic assessment (IAA) and video quality assessment (VQA) tasks.
arXiv Detail & Related papers (2023-12-28T16:10:25Z) - Mobile Foundation Model as Firmware [13.225478051091763]
sys is a collaborative management approach between the mobile OS and hardware.
It amalgamates a curated selection of publicly available Large Language Models (LLMs) and facilitates dynamic data flow.
It attains accuracy parity in 85% of tasks, demonstrates improved scalability in terms of storage and memory, and offers satisfactory inference speed.
arXiv Detail & Related papers (2023-08-28T07:21:26Z) - MMBench: Is Your Multi-modal Model an All-around Player? [114.45702807380415]
How to evaluate large vision-language models remains a major obstacle, hindering future model development.
Traditional benchmarks provide quantitative performance measurements but suffer from a lack of fine-grained ability assessment and non-robust evaluation metrics.
Recent subjective benchmarks, such as OwlEval, offer comprehensive evaluations of a model's abilities by incorporating human labor, but they are not scalable and display significant bias.
MMBench is a systematically-designed objective benchmark for robustly evaluating the various abilities of vision-language models.
arXiv Detail & Related papers (2023-07-12T16:23:09Z) - Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction [28.53259866617677]
We introduce Mobile-Env, a comprehensive toolkit tailored for creating GUI benchmarks in the Android mobile environment.
We collect an open-world task set across various real-world apps and a fixed world set, WikiHow, which captures a significant amount of dynamic online contents.
Our findings reveal that even advanced models struggle with tasks that are relatively simple for humans.
arXiv Detail & Related papers (2023-05-14T12:31:03Z) - Meta Matrix Factorization for Federated Rating Predictions [84.69112252208468]
Federated recommender systems have distinct advantages in terms of privacy protection over traditional recommender systems.
Previous work on federated recommender systems does not fully consider the limitations of storage, RAM, energy and communication bandwidth in a mobile environment.
Our goal in this paper is to design a novel federated learning framework for rating prediction (RP) for mobile environments.
arXiv Detail & Related papers (2019-10-22T16:29:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.