Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML
- URL: http://arxiv.org/abs/2409.12994v2
- Date: Tue, 29 Oct 2024 09:07:58 GMT
- Title: Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML
- Authors: Chelsea Maria John, Stepan Nassyr, Carolin Penke, Andreas Herten,
- Abstract summary: The CARAML benchmark suite is employed to assess performance and energy consumption during the training of large language models and computer vision models.
CarAML provides a compact, automated, and reproducible framework for assessing the performance and energy of ML workloads.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The rapid advancement of machine learning (ML) technologies has driven the development of specialized hardware accelerators designed to facilitate more efficient model training. This paper introduces the CARAML benchmark suite, which is employed to assess performance and energy consumption during the training of transformer-based large language models and computer vision models on a range of hardware accelerators, including systems from NVIDIA, AMD, and Graphcore. CARAML provides a compact, automated, extensible, and reproducible framework for assessing the performance and energy of ML workloads across various novel hardware architectures. The design and implementation of CARAML, along with a custom power measurement tool called jpwr, are discussed in detail.
Related papers
- Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance [78.48606021719206]
Mini-InternVL is a series of MLLMs with parameters ranging from 1B to 4B, which achieves 90% of the performance with only 5% of the parameters.
We develop a unified adaptation framework for Mini-InternVL, which enables our models to transfer and outperform specialized models in downstream tasks.
arXiv Detail & Related papers (2024-10-21T17:58:20Z) - Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI.
As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios.
This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z) - MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases [81.70591346986582]
We introduce MobileAIBench, a benchmarking framework for evaluating Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices.
MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices.
arXiv Detail & Related papers (2024-06-12T22:58:12Z) - Using the Abstract Computer Architecture Description Language to Model
AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements.
The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams.
In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z) - A Survey on Hardware Accelerators for Large Language Models [0.0]
Large Language Models (LLMs) have emerged as powerful tools for natural language processing tasks.
There is a pressing need to address the computational challenges associated with their scale and complexity.
arXiv Detail & Related papers (2024-01-18T11:05:03Z) - DEAP: Design Space Exploration for DNN Accelerator Parallelism [0.0]
Large Language Models (LLMs) are becoming increasingly complex and powerful to train and serve.
This paper showcases how hardware and software co-design can come together and allow us to create customized hardware systems.
arXiv Detail & Related papers (2023-12-24T02:43:01Z) - Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs [14.397623940689487]
Graphcore Intelligence Processing Unit (IPU), Sambanova Reconfigurable Dataflow Unit (RDU), and enhanced GPU platforms are reviewed.
This research provides a preliminary evaluation and comparison of these commercial AI/ML accelerators.
arXiv Detail & Related papers (2023-11-08T01:06:25Z) - SATAY: A Streaming Architecture Toolflow for Accelerating YOLO Models on
FPGA Devices [48.47320494918925]
This work tackles the challenges of deploying stateof-the-art object detection models onto FPGA devices for ultralow latency applications.
We employ a streaming architecture design for our YOLO accelerators, implementing the complete model on-chip in a deeply pipelined fashion.
We introduce novel hardware components to support the operations of YOLO models in a dataflow manner, and off-chip memory buffering to address the limited on-chip memory resources.
arXiv Detail & Related papers (2023-09-04T13:15:01Z) - Machine Learning Accelerators in 2.5D Chiplet Platforms with Silicon
Photonics [5.190207094732673]
Domain-specific machine learning (ML) accelerators such as Google's TPU and Apple's Neural Engine now dominate CPUs and GPU for energy-efficient ML processing.
We present a vision of how optical computation and communication can be integrated into 2.5D chiplet platforms to drive an entirely new class of sustainable and scalable ML hardware accelerators.
arXiv Detail & Related papers (2023-01-28T17:06:53Z) - SeLoC-ML: Semantic Low-Code Engineering for Machine Learning
Applications in Industrial IoT [9.477629856092218]
This paper presents a framework called Semantic Low-Code Engineering for ML Applications (SeLoC-ML)
SeLoC-ML enables non-experts to model, discover, reuse, and matchmake ML models and devices at scale.
Developers can benefit from semantic application templates, called recipes, to fast prototype end-user applications.
arXiv Detail & Related papers (2022-07-18T13:06:21Z) - Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
We have developed a proven systems engineering approach for machine learning development and deployment.
Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.