AIBench Training: Balanced Industry-Standard AI Training Benchmarking
- URL: http://arxiv.org/abs/2004.14690v4
- Date: Wed, 10 Mar 2021 06:24:57 GMT
- Title: AIBench Training: Balanced Industry-Standard AI Training Benchmarking
- Authors: Fei Tang, Wanling Gao, Jianfeng Zhan, Chuanxin Lan, Xu Wen, Lei Wang,
Chunjie Luo, Jiahui Dai, Zheng Cao, Xingwang Xiong, Zihan Jiang, Tianshu Hao,
Fanda Fan, Fan Zhang, Yunyou Huang, Jianan Chen, Mengjia Du, Rui Ren, Chen
Zheng, Daoyi Zheng, Haoning Tang, Kunlin Zhan, Biao Wang, Defei Kong, Minghe
Yu, Chongkang Tan, Huan Li, Xinhui Tian, Yatao Li, Junchao Shao, Zhenyu Wang,
Xiaoyu Wang, and Hainan Ye
- Abstract summary: Earlier-stage evaluations of a new AI architecture/system need affordable benchmarks.
We use real-world benchmarks to cover the factors space that impacts the learning dynamics.
We contribute by far the most comprehensive AI training benchmark suite.
- Score: 26.820244556465333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Earlier-stage evaluations of a new AI architecture/system need affordable
benchmarks. Only using a few AI component benchmarks like MLPerfalone in the
other stages may lead to misleading conclusions. Moreover, the learning
dynamics are not well understood, and the benchmarks' shelf-life is short. This
paper proposes a balanced benchmarking methodology. We use real-world
benchmarks to cover the factors space that impacts the learning dynamics to the
most considerable extent. After performing an exhaustive survey on Internet
service AI domains, we identify and implement nineteen representative AI tasks
with state-of-the-art models. For repeatable performance ranking (RPR subset)
and workload characterization (WC subset), we keep two subsets to a minimum for
affordability. We contribute by far the most comprehensive AI training
benchmark suite. The evaluations show: (1) AIBench Training (v1.1) outperforms
MLPerfTraining (v0.7) in terms of diversity and representativeness of model
complexity, computational cost, convergent rate, computation, and memory access
patterns, and hotspot functions; (2) Against the AIBench full benchmarks, its
RPR subset shortens the benchmarking cost by 64%, while maintaining the primary
workload characteristics; (3) The performance ranking shows the single-purpose
AI accelerator like TPU with the optimized TensorFlowframework performs better
than that of GPUs while losing the latter's general support for various AI
models. The specification, source code, and performance numbers are available
from the AIBench homepage
https://www.benchcouncil.org/aibench-training/index.html.
Related papers
- Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation? [90.30635552818875]
We present Touchstone, a large-scale collaborative segmentation benchmark of 9 types of abdominal organs.
This benchmark is based on 5,195 training CT scans from 76 hospitals around the world and 5,903 testing CT scans from 11 additional hospitals.
We invited 14 inventors of 19 AI algorithms to train their algorithms, while our team, as a third party, independently evaluated these algorithms on three test sets.
arXiv Detail & Related papers (2024-11-06T05:09:34Z) - Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level [73.14232472724758]
We introduce Agent K v1.0, an end-to-end autonomous data science agent.
It manages the entire data science life cycle by learning from experience.
It optimises long- and short-term memory by selectively storing and retrieving key information.
arXiv Detail & Related papers (2024-11-05T23:55:23Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own [59.11934130045106]
We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models.
Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions.
Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation.
arXiv Detail & Related papers (2023-10-04T07:56:42Z) - Does AI for science need another ImageNet Or totally different
benchmarks? A case study of machine learning force fields [5.622820801789953]
AI for science (AI4S) aims to enhance the accuracy and speed of scientific computing tasks using machine learning methods.
Traditional AI benchmarking methods struggle to adapt to the unique challenges posed by AI4S because they assume data in training, testing, and future real-world queries are independent and identically distributed.
This paper investigates the need for a novel approach to effectively benchmark AI for science, using the machine learning force field (MLFF) as a case study.
arXiv Detail & Related papers (2023-08-11T08:06:58Z) - Is One Epoch All You Need For Multi-Fidelity Hyperparameter
Optimization? [17.21160278797221]
Multi-fidelity HPO (MF-HPO) leverages intermediate accuracy levels in the learning process and discards low-performing models early on.
We compared various representative MF-HPO methods against a simple baseline on classical benchmark data.
This baseline achieved similar results to its counterparts, while requiring an order of magnitude less computation.
arXiv Detail & Related papers (2023-07-28T09:14:41Z) - Re-Evaluating LiDAR Scene Flow for Autonomous Driving [80.37947791534985]
Popular benchmarks for self-supervised LiDAR scene flow have unrealistic rates of dynamic motion, unrealistic correspondences, and unrealistic sampling patterns.
We evaluate a suite of top methods on a suite of real-world datasets.
We show that despite the emphasis placed on learning, most performance gains are caused by pre- and post-processing steps.
arXiv Detail & Related papers (2023-04-04T22:45:50Z) - Mystique: Enabling Accurate and Scalable Generation of Production AI
Benchmarks [2.0315147707806283]
Mystique is an accurate and scalable framework for production AI benchmark generation.
Mystique is scalable, due to its lightweight data collection, in terms of overhead runtime and instrumentation effort.
We evaluate our methodology on several production AI models, and show that benchmarks generated with Mystique closely resemble original AI models.
arXiv Detail & Related papers (2022-12-16T18:46:37Z) - How much progress have we made in neural network training? A New
Evaluation Protocol for Benchmarking Optimizers [86.36020260204302]
We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency.
A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search.
We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
arXiv Detail & Related papers (2020-10-19T21:46:39Z) - AIPerf: Automated machine learning as an AI-HPC benchmark [17.57686674304368]
We propose an end-to-end benchmark suite utilizing automated machine learning (AutoML)
We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems.
With flexible workload and single metric, our benchmark can scale and rank AI- HPC easily.
arXiv Detail & Related papers (2020-08-17T08:06:43Z) - AIBench: An Agile Domain-specific Benchmarking Methodology and an AI
Benchmark Suite [26.820244556465333]
This paper proposes an agile domain-specific benchmarking methodology.
We identify ten important end-to-end application scenarios, among which sixteen representative AI tasks are distilled as the AI component benchmarks.
We present the first end-to-end Internet service AI benchmark.
arXiv Detail & Related papers (2020-02-17T07:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.