MLPerf Mobile Inference Benchmark
- URL: http://arxiv.org/abs/2012.02328v2
- Date: Fri, 26 Feb 2021 14:34:51 GMT
- Title: MLPerf Mobile Inference Benchmark
- Authors: Vijay Janapa Reddi, David Kanter, Peter Mattson, Jared Duke, Thai
Nguyen, Ramesh Chukka, Kenneth Shiring, Koan-Sin Tan, Mark Charlebois,
William Chou, Mostafa El-Khamy, Jungwook Hong, Michael Buch, Cindy Trinh,
Thomas Atta-fosu, Fatih Cakir, Masoud Charkhabi, Xiaodong Chen, Jimmy Chiang,
Dave Dexter, Woncheol Heo, Guenther Schmuelling, Maryam Shabani, Dylan Zika
- Abstract summary: erferf Mobile is the first industry-standard open-source mobile benchmark developed by industry members and academic researchers.
For the first, we developed an app to provide an "out-of-the-box" inference-performance benchmark for computer vision and natural-language processing on mobile devices.
- Score: 11.883357894242668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: MLPerf Mobile is the first industry-standard open-source mobile benchmark
developed by industry members and academic researchers to allow
performance/accuracy evaluation of mobile devices with different AI chips and
software stacks. The benchmark draws from the expertise of leading mobile-SoC
vendors, ML-framework providers, and model producers. In this paper, we
motivate the drive to demystify mobile-AI performance and present MLPerf
Mobile's design considerations, architecture, and implementation. The benchmark
comprises a suite of models that operate under standard models, data sets,
quality metrics, and run rules. For the first iteration, we developed an app to
provide an "out-of-the-box" inference-performance benchmark for computer vision
and natural-language processing on mobile devices. MLPerf Mobile can serve as a
framework for integrating future models, for customizing quality-target
thresholds to evaluate system performance, for comparing software frameworks,
and for assessing heterogeneous-hardware capabilities for machine learning, all
fairly and faithfully with fully reproducible results.
Related papers
- PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms [11.87161637895978]
We introduce our lightweight, all-in-one automated benchmarking framework that allows users to evaluate large language models on mobile devices.
We provide a benchmark of various popular LLMs with different quantization configurations (both weights and activations) across multiple mobile platforms with varying hardware capabilities.
arXiv Detail & Related papers (2024-10-05T03:37:07Z) - Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation [10.817783356090027]
Large language models (LLMs) increasingly integrate into every aspect of our work and daily lives.
There are growing concerns about user privacy, which push the trend toward local deployment of these models.
As a rapidly emerging application, we are concerned about their performance on commercial-off-the-shelf mobile devices.
arXiv Detail & Related papers (2024-10-04T17:14:59Z) - VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents [50.12414817737912]
Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable Visual Foundation Agents.
Existing benchmarks fail to sufficiently challenge or showcase the full potential of LMMs in complex, real-world environments.
VisualAgentBench (VAB) is a pioneering benchmark specifically designed to train and evaluate LMMs as visual foundation agents.
arXiv Detail & Related papers (2024-08-12T17:44:17Z) - Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment.
To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z) - MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases [81.70591346986582]
We introduce MobileAIBench, a benchmarking framework for evaluating Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices.
MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices.
arXiv Detail & Related papers (2024-06-12T22:58:12Z) - MELTing point: Mobile Evaluation of Language Transformers [8.238355633015068]
We explore the current state of mobile execution of Large Language Models (LLMs)
We have created our own automation infrastructure, MELT, which supports the headless execution and benchmarking of LLMs on device.
We evaluate popular instruction fine-tuned LLMs and leverage different frameworks to measure their end-to-end and granular performance.
arXiv Detail & Related papers (2024-03-19T15:51:21Z) - Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception [52.5831204440714]
We introduce Mobile-Agent, an autonomous multi-modal mobile device agent.
Mobile-Agent first leverages visual perception tools to accurately identify and locate both the visual and textual elements within the app's front-end interface.
It then autonomously plans and decomposes the complex operation task, and navigates the mobile Apps through operations step by step.
arXiv Detail & Related papers (2024-01-29T13:46:37Z) - Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined
Levels [95.44077384918725]
We propose to teach large multi-modality models (LMMs) with text-defined rating levels instead of scores.
The proposed Q-Align achieves state-of-the-art performance on image quality assessment (IQA), image aesthetic assessment (IAA) and video quality assessment (VQA) tasks.
arXiv Detail & Related papers (2023-12-28T16:10:25Z) - Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction [28.53259866617677]
We introduce Mobile-Env, a comprehensive toolkit tailored for creating GUI benchmarks in the Android mobile environment.
We collect an open-world task set across various real-world apps and a fixed world set, WikiHow, which captures a significant amount of dynamic online contents.
Our findings reveal that even advanced models struggle with tasks that are relatively simple for humans.
arXiv Detail & Related papers (2023-05-14T12:31:03Z) - Meta Matrix Factorization for Federated Rating Predictions [84.69112252208468]
Federated recommender systems have distinct advantages in terms of privacy protection over traditional recommender systems.
Previous work on federated recommender systems does not fully consider the limitations of storage, RAM, energy and communication bandwidth in a mobile environment.
Our goal in this paper is to design a novel federated learning framework for rating prediction (RP) for mobile environments.
arXiv Detail & Related papers (2019-10-22T16:29:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.