DEAP: Design Space Exploration for DNN Accelerator Parallelism
- URL: http://arxiv.org/abs/2312.15388v1
- Date: Sun, 24 Dec 2023 02:43:01 GMT
- Title: DEAP: Design Space Exploration for DNN Accelerator Parallelism
- Authors: Ekansh Agrawal and Xiangyu Sam Xu
- Abstract summary: Large Language Models (LLMs) are becoming increasingly complex and powerful to train and serve.
This paper showcases how hardware and software co-design can come together and allow us to create customized hardware systems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The boom in Large Language Models (LLMs) like GPT-4 and ChatGPT has marked a
significant advancement in artificial intelligence. These models are becoming
increasingly complex and powerful to train and serve. This growth in
capabilities comes with a substantial increase in computational requirements,
both in terms of hardware resources and energy consumption. The goal of this
paper is to showcase how hardware and software co-design can come together and
allow us to create customized hardware systems for specific LLM workloads. We
propose a simulation workflow that allows us to combine model parallelism
techniques with a multi-accelerator simulation framework for efficiency
metrics. We focus on inference workloads and report power, cycle, and latency
metrics upon performing a design space exploration search over multiple
software and hardware configurations.
Related papers
- Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML [0.0]
The CARAML benchmark suite is employed to assess performance and energy consumption during the training of large language models and computer vision models.
CarAML provides a compact, automated, and reproducible framework for assessing the performance and energy of ML workloads.
arXiv Detail & Related papers (2024-09-19T12:43:18Z) - Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI.
As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios.
This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z) - Using the Abstract Computer Architecture Description Language to Model
AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements.
The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams.
In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous
Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes.
It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training.
We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - Chakra: Advancing Performance Benchmarking and Co-design using
Standardized Execution Traces [5.692357167709513]
We propose Chakra, an open graph schema for standardizing workload specification capturing key operations and dependencies, also known as Execution Trace (ET)
For instance, we use generative AI models to learn latent statistical properties across thousands of Chakra ETs and use these models to synthesize Chakra ETs.
Our end-goal is to build a vibrant industry-wide ecosystem of agile benchmarks and tools to drive future AI system co-design.
arXiv Detail & Related papers (2023-05-23T20:45:45Z) - A Scalable Approach to Modeling on Accelerated Neuromorphic Hardware [0.0]
This work presents the software aspects of the BrainScaleS-2 system, a hybrid accelerated neuromorphic hardware architecture based on physical modeling.
We introduce key aspects of the BrainScaleS-2 Operating System: experiment workflow, API layering, software design, and platform operation.
The focus lies on novel system and software features such as multi-compartmental neurons, fast re-configuration for hardware-in-the-loop training, applications for the embedded processors, the non-spiking operation mode, interactive platform access, and sustainable hardware/software co-development.
arXiv Detail & Related papers (2022-03-21T16:30:18Z) - Compiler-Driven Simulation of Reconfigurable Hardware Accelerators [0.8807375890824978]
Existing simulators tend to two extremes: low-level and general approaches, such as RTL simulation, that can model any hardware but require substantial effort and long execution times.
This work proposes a compiler-driven simulation workflow that can model hardware accelerator.
arXiv Detail & Related papers (2022-02-01T20:31:04Z) - JUWELS Booster -- A Supercomputer for Large-Scale AI Research [79.02246047353273]
We present JUWELS Booster, a recently commissioned high-performance computing system at the J"ulich Supercomputing Center.
We detail its system architecture, parallel, distributed model training, and benchmarks indicating its outstanding performance.
arXiv Detail & Related papers (2021-06-30T21:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.