Benchmarking TinyML Systems: Challenges and Direction
- URL: http://arxiv.org/abs/2003.04821v4
- Date: Fri, 29 Jan 2021 21:21:37 GMT
- Title: Benchmarking TinyML Systems: Challenges and Direction
- Authors: Colby R. Banbury, Vijay Janapa Reddi, Max Lam, William Fu, Amin Fazel,
Jeremy Holleman, Xinyuan Huang, Robert Hurtado, David Kanter, Anton
Lokhmotov, David Patterson, Danilo Pau, Jae-sun Seo, Jeff Sieracki, Urmish
Thakker, Marian Verhelst, Poonam Yadav
- Abstract summary: We present the current landscape of TinyML and discuss the challenges and direction towards developing a fair and useful hardware benchmark for TinyML workloads.
Our viewpoints reflect the collective thoughts of the TinyMLPerf working group that is comprised of over 30 organizations.
- Score: 10.193715318589812
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in ultra-low-power machine learning (TinyML) hardware
promises to unlock an entirely new class of smart applications. However,
continued progress is limited by the lack of a widely accepted benchmark for
these systems. Benchmarking allows us to measure and thereby systematically
compare, evaluate, and improve the performance of systems and is therefore
fundamental to a field reaching maturity. In this position paper, we present
the current landscape of TinyML and discuss the challenges and direction
towards developing a fair and useful hardware benchmark for TinyML workloads.
Furthermore, we present our four benchmarks and discuss our selection
methodology. Our viewpoints reflect the collective thoughts of the TinyMLPerf
working group that is comprised of over 30 organizations.
Related papers
- DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems [99.17123445211115]
We introduce DocBench, a benchmark to evaluate large language model (LLM)-based document reading systems.
Our benchmark involves the recruitment of human annotators and the generation of synthetic questions.
It includes 229 real documents and 1,102 questions, spanning across five different domains and four major types of questions.
arXiv Detail & Related papers (2024-07-15T13:17:42Z) - Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models [50.653838482083614]
This paper introduces a scalable test-bed to assess the capabilities of IT-LVLMs on fundamental computer vision tasks.
MERLIM contains over 300K image-question pairs and has a strong focus on detecting cross-modal "hallucination" events in IT-LVLMs.
arXiv Detail & Related papers (2023-12-03T16:39:36Z) - SEED-Bench-2: Benchmarking Multimodal Large Language Models [67.28089415198338]
Multimodal large language models (MLLMs) have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal inputs.
SEED-Bench-2 comprises 24K multiple-choice questions with accurate human annotations, which spans 27 dimensions.
We evaluate the performance of 23 prominent open-source MLLMs and summarize valuable observations.
arXiv Detail & Related papers (2023-11-28T05:53:55Z) - InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal
Large Language Models [50.03163753638256]
Multi-modal Large Language Models (MLLMs) are increasingly prominent in the field of artificial intelligence.
Our benchmark comprises three key reasoning categories: deductive, abductive, and analogical reasoning.
We evaluate a selection of representative MLLMs using this rigorously developed open-ended multi-step elaborate reasoning benchmark.
arXiv Detail & Related papers (2023-11-20T07:06:31Z) - MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [73.86954509967416]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks.
This paper presents the first comprehensive MLLM Evaluation benchmark MME.
It measures both perception and cognition abilities on a total of 14 subtasks.
arXiv Detail & Related papers (2023-06-23T09:22:36Z) - MLonMCU: TinyML Benchmarking with Fast Retargeting [1.4319942396517]
It is non-trivial to choose the optimal combination of frameworks and targets for a given application.
A tool called MLonMCU is proposed in this paper and demonstrated by benchmarking the state-of-the-art TinyML frameworks TFLite for Microcontrollers and TVM effortlessly.
arXiv Detail & Related papers (2023-06-15T08:44:35Z) - Intelligence at the Extreme Edge: A Survey on Reformable TinyML [0.0]
We present a survey on reformable TinyML solutions with the proposal of a novel taxonomy for ease of separation.
We explore the workflow of TinyML and analyze the identified deployment schemes and the scarcely available benchmarking tools.
arXiv Detail & Related papers (2022-04-02T09:53:36Z) - TinyML Platforms Benchmarking [0.0]
Recent advances in ultra-low power embedded devices for machine learning (ML) have permitted a new class of products.
TinyML provides a unique solution by aggregating and analyzing data at the edge on low-power embedded devices.
Many TinyML frameworks have been developed for different platforms to facilitate the deployment of ML models.
arXiv Detail & Related papers (2021-11-30T15:26:26Z) - The Benchmark Lottery [114.43978017484893]
"A benchmark lottery" describes the overall fragility of the machine learning benchmarking process.
We show that the relative performance of algorithms may be altered significantly simply by choosing different benchmark tasks.
arXiv Detail & Related papers (2021-07-14T21:08:30Z) - MLPerf Tiny Benchmark [1.1178096184080788]
We present Tinyerf Tiny, the first industry-standard benchmark suite for ultra-low-power tiny machine learning systems.
Tinyerf Tiny measures the accuracy, latency, and energy of machine learning inference to properly evaluate the tradeoffs between systems.
arXiv Detail & Related papers (2021-06-14T17:05:17Z) - Exploring and Analyzing Machine Commonsense Benchmarks [0.13999481573773073]
We argue that the lack of a common vocabulary for aligning these approaches' metadata limits researchers in their efforts to understand systems' deficiencies.
We describe our initial MCS Benchmark Ontology, an common vocabulary that formalizes benchmark metadata.
arXiv Detail & Related papers (2020-12-21T19:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.