AIBench: An Agile Domain-specific Benchmarking Methodology and an AI
Benchmark Suite
- URL: http://arxiv.org/abs/2002.07162v1
- Date: Mon, 17 Feb 2020 07:29:05 GMT
- Title: AIBench: An Agile Domain-specific Benchmarking Methodology and an AI
Benchmark Suite
- Authors: Wanling Gao, Fei Tang, Jianfeng Zhan, Chuanxin Lan, Chunjie Luo, Lei
Wang, Jiahui Dai, Zheng Cao, Xiongwang Xiong, Zihan Jiang, Tianshu Hao, Fanda
Fan, Xu Wen, Fan Zhang, Yunyou Huang, Jianan Chen, Mengjia Du, Rui Ren, Chen
Zheng, Daoyi Zheng, Haoning Tang, Kunlin Zhan, Biao Wang, Defei Kong, Minghe
Yu, Chongkang Tan, Huan Li, Xinhui Tian, Yatao Li, Gang Lu, Junchao Shao,
Zhenyu Wang, Xiaoyu Wang, Hainan Ye
- Abstract summary: This paper proposes an agile domain-specific benchmarking methodology.
We identify ten important end-to-end application scenarios, among which sixteen representative AI tasks are distilled as the AI component benchmarks.
We present the first end-to-end Internet service AI benchmark.
- Score: 26.820244556465333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain-specific software and hardware co-design is encouraging as it is much
easier to achieve efficiency for fewer tasks. Agile domain-specific
benchmarking speeds up the process as it provides not only relevant design
inputs but also relevant metrics, and tools. Unfortunately, modern workloads
like Big data, AI, and Internet services dwarf the traditional one in terms of
code size, deployment scale, and execution path, and hence raise serious
benchmarking challenges.
This paper proposes an agile domain-specific benchmarking methodology.
Together with seventeen industry partners, we identify ten important end-to-end
application scenarios, among which sixteen representative AI tasks are
distilled as the AI component benchmarks. We propose the permutations of
essential AI and non-AI component benchmarks as end-to-end benchmarks. An
end-to-end benchmark is a distillation of the essential attributes of an
industry-scale application. We design and implement a highly extensible,
configurable, and flexible benchmark framework, on the basis of which, we
propose the guideline for building end-to-end benchmarks, and present the first
end-to-end Internet service AI benchmark.
The preliminary evaluation shows the value of our benchmark suite---AIBench
against MLPerf and TailBench for hardware and software designers,
micro-architectural researchers, and code developers. The specifications,
source code, testbed, and results are publicly available from the web site
\url{http://www.benchcouncil.org/AIBench/index.html}.
Related papers
- BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices [28.70453947993952]
We develop an assessment framework considering 46 best practices across an AI benchmark's lifecycle and evaluate 24 AI benchmarks against it.
We find that there exist large quality differences and that commonly used benchmarks suffer from significant issues.
arXiv Detail & Related papers (2024-11-20T02:38:24Z) - Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms [77.71341200638416]
ChiPBench is a benchmark designed to evaluate the effectiveness of AI-based chip placement algorithms.
We have gathered 20 circuits from various domains (e.g., CPU, GPU, and microcontrollers) for evaluation.
Results show that even if intermediate metric of a single-point algorithm is dominant, the final PPA results are unsatisfactory.
arXiv Detail & Related papers (2024-07-03T03:29:23Z) - ECBD: Evidence-Centered Benchmark Design for NLP [95.50252564938417]
We propose Evidence-Centered Benchmark Design (ECBD), a framework which formalizes the benchmark design process into five modules.
Each module requires benchmark designers to describe, justify, and support benchmark design choices.
Our analysis reveals common trends in benchmark design and documentation that could threaten the validity of benchmarks' measurements.
arXiv Detail & Related papers (2024-06-13T00:59:55Z) - Benchmarks for Automated Commonsense Reasoning: A Survey [0.0]
More than one hundred benchmarks have been developed to test the commonsense knowledge and commonsense reasoning abilities of AI systems.
This paper surveys the development and uses of AI commonsense benchmarks.
arXiv Detail & Related papers (2023-02-09T16:34:30Z) - Mystique: Enabling Accurate and Scalable Generation of Production AI
Benchmarks [2.0315147707806283]
Mystique is an accurate and scalable framework for production AI benchmark generation.
Mystique is scalable, due to its lightweight data collection, in terms of overhead runtime and instrumentation effort.
We evaluate our methodology on several production AI models, and show that benchmarks generated with Mystique closely resemble original AI models.
arXiv Detail & Related papers (2022-12-16T18:46:37Z) - DataPerf: Benchmarks for Data-Centric AI Development [81.03754002516862]
DataPerf is a community-led benchmark suite for evaluating ML datasets and data-centric algorithms.
We provide an open, online platform with multiple rounds of challenges to support this iterative development.
The benchmarks, online evaluation platform, and baseline implementations are open source.
arXiv Detail & Related papers (2022-07-20T17:47:54Z) - Design-Bench: Benchmarks for Data-Driven Offline Model-Based
Optimization [82.02008764719896]
Black-box model-based optimization problems are ubiquitous in a wide range of domains, such as the design of proteins, DNA sequences, aircraft, and robots.
We present Design-Bench, a benchmark for offline MBO with a unified evaluation protocol and reference implementations of recent methods.
Our benchmark includes a suite of diverse and realistic tasks derived from real-world optimization problems in biology, materials science, and robotics.
arXiv Detail & Related papers (2022-02-17T05:33:27Z) - The Benchmark Lottery [114.43978017484893]
"A benchmark lottery" describes the overall fragility of the machine learning benchmarking process.
We show that the relative performance of algorithms may be altered significantly simply by choosing different benchmark tasks.
arXiv Detail & Related papers (2021-07-14T21:08:30Z) - Integrated Benchmarking and Design for Reproducible and Accessible
Evaluation of Robotic Agents [61.36681529571202]
We describe a new concept for reproducible robotics research that integrates development and benchmarking.
One of the central components of this setup is the Duckietown Autolab, a standardized setup that is itself relatively low-cost and reproducible.
We validate the system by analyzing the repeatability of experiments conducted using the infrastructure and show that there is low variance across different robot hardware and across different remote labs.
arXiv Detail & Related papers (2020-09-09T15:31:29Z) - AIBench Scenario: Scenario-distilling AI Benchmarking [8.909947747424672]
We formalize a real-world application scenario as a Directed Acyclic Graph-based model.
We propose the rules to distill it into a permutation of essential AI and non-AI tasks, which we call a scenario benchmark.
We implement two Internet service AI scenario benchmarks based on the framework as proxies to two real-world application scenarios.
arXiv Detail & Related papers (2020-05-06T01:24:25Z) - AIBench Training: Balanced Industry-Standard AI Training Benchmarking [26.820244556465333]
Earlier-stage evaluations of a new AI architecture/system need affordable benchmarks.
We use real-world benchmarks to cover the factors space that impacts the learning dynamics.
We contribute by far the most comprehensive AI training benchmark suite.
arXiv Detail & Related papers (2020-04-30T11:08:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.