AIPerf: Automated machine learning as an AI-HPC benchmark
- URL: http://arxiv.org/abs/2008.07141v7
- Date: Mon, 15 Mar 2021 02:25:55 GMT
- Title: AIPerf: Automated machine learning as an AI-HPC benchmark
- Authors: Zhixiang Ren, Yongheng Liu, Tianhui Shi, Lei Xie, Yue Zhou, Jidong
Zhai, Youhui Zhang, Yunquan Zhang, Wenguang Chen
- Abstract summary: We propose an end-to-end benchmark suite utilizing automated machine learning (AutoML)
We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems.
With flexible workload and single metric, our benchmark can scale and rank AI- HPC easily.
- Score: 17.57686674304368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The plethora of complex artificial intelligence (AI) algorithms and available
high performance computing (HPC) power stimulates the expeditious development
of AI components with heterogeneous designs. Consequently, the need for
cross-stack performance benchmarking of AI-HPC systems emerges rapidly. The de
facto HPC benchmark LINPACK can not reflect AI computing power and I/O
performance without representative workload. The current popular AI benchmarks
like MLPerf have fixed problem size therefore limited scalability. To address
these issues, we propose an end-to-end benchmark suite utilizing automated
machine learning (AutoML), which not only represents real AI scenarios, but
also is auto-adaptively scalable to various scales of machines. We implement
the algorithms in a highly parallel and flexible way to ensure the efficiency
and optimization potential on diverse systems with customizable configurations.
We utilize operations per second (OPS), which is measured in an analytical and
systematic approach, as the major metric to quantify the AI performance. We
perform evaluations on various systems to ensure the benchmark's stability and
scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured), up
to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the
results show near-linear weak scalability. With flexible workload and single
metric, our benchmark can scale and rank AI-HPC easily.
Related papers
- DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms.
We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM.
DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z) - Adaptation of XAI to Auto-tuning for Numerical Libraries [0.0]
Explainable AI (XAI) technology is gaining prominence, aiming to streamline AI model development and alleviate the burden of explaining AI outputs to users.
This research focuses on XAI for AI models when integrated into two different processes for practical numerical computations.
arXiv Detail & Related papers (2024-05-12T09:00:56Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Cheaply Evaluating Inference Efficiency Metrics for Autoregressive
Transformer APIs [66.30706841821123]
Large language models (LLMs) power many state-of-the-art systems in natural language processing.
LLMs are extremely computationally expensive, even at inference time.
We propose a new metric for comparing inference efficiency across models.
arXiv Detail & Related papers (2023-05-03T21:51:42Z) - Mystique: Enabling Accurate and Scalable Generation of Production AI
Benchmarks [2.0315147707806283]
Mystique is an accurate and scalable framework for production AI benchmark generation.
Mystique is scalable, due to its lightweight data collection, in terms of overhead runtime and instrumentation effort.
We evaluate our methodology on several production AI models, and show that benchmarks generated with Mystique closely resemble original AI models.
arXiv Detail & Related papers (2022-12-16T18:46:37Z) - SAIH: A Scalable Evaluation Methodology for Understanding AI Performance
Trend on HPC Systems [18.699431277588637]
We propose a scalable evaluation methodology (SAIH) for analyzing the AI performance trend of HPC systems.
As the data and model constantly scale, we can investigate the trend and range of AI performance on HPC systems.
arXiv Detail & Related papers (2022-12-07T02:42:29Z) - ProcTHOR: Large-Scale Embodied AI Using Procedural Generation [55.485985317538194]
ProcTHOR is a framework for procedural generation of Embodied AI environments.
We demonstrate state-of-the-art results across 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation.
arXiv Detail & Related papers (2022-06-14T17:09:35Z) - FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs)
Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z) - How to Reach Real-Time AI on Consumer Devices? Solutions for
Programmable and Custom Architectures [7.085772863979686]
Deep neural networks (DNNs) have led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition.
deploying such AI models across commodity devices faces significant challenges.
We present techniques for achieving real-time performance following a cross-stack approach.
arXiv Detail & Related papers (2021-06-21T11:23:12Z) - Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with
Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications.
We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS)
Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z) - AIBench Training: Balanced Industry-Standard AI Training Benchmarking [26.820244556465333]
Earlier-stage evaluations of a new AI architecture/system need affordable benchmarks.
We use real-world benchmarks to cover the factors space that impacts the learning dynamics.
We contribute by far the most comprehensive AI training benchmark suite.
arXiv Detail & Related papers (2020-04-30T11:08:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.