Performance Analysis of Deep Learning Workloads on a Composable System
- URL: http://arxiv.org/abs/2103.10911v1
- Date: Fri, 19 Mar 2021 17:15:42 GMT
- Title: Performance Analysis of Deep Learning Workloads on a Composable System
- Authors: Kauotar El Maghraoui and Lorraine M. Herger and Chekuri Choudary and
Kim Tran and Todd Deshane and David Hanson
- Abstract summary: Composable infrastructure is defined as resources, such as compute, storage, accelerators and networking, that are shared in a pool.
This paper details the design of an enterprise composable infrastructure that we have implemented and made available to our partners in the IBM Research AI Hardware Center.
- Score: 0.08388591755871731
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A composable infrastructure is defined as resources, such as compute,
storage, accelerators and networking, that are shared in a pool and that can be
grouped in various configurations to meet application requirements. This
freedom to 'mix and match' resources dynamically allows for experimentation
early in the design cycle, prior to the final architectural design or hardware
implementation of a system. This design provides flexibility to serve a variety
of workloads and provides a dynamic co-design platform that allows experiments
and measurements in a controlled manner. For instance, key performance
bottlenecks can be revealed early on in the experimentation phase thus avoiding
costly and time consuming mistakes. Additionally, various system-level
topologies can be evaluated when experimenting with new System on Chip (SoCs)
and new accelerator types. This paper details the design of an enterprise
composable infrastructure that we have implemented and made available to our
partners in the IBM Research AI Hardware Center (AIHC). Our experimental
evaluations on the composable system give insights into how the system works
and evaluates the impact of various resource aggregations and reconfigurations
on representative deep learning benchmarks.
Related papers
- From Computation to Consumption: Exploring the Compute-Energy Link for Training and Testing Neural Networks for SED Systems [9.658615045493734]
We study several neural network architectures that are key components of sound event detection systems.
We measure the energy consumption for training and testing small to large architectures.
We establish complex relationships between the energy consumption, the number of floating-point operations, the number of parameters, and the GPU/memory utilization.
arXiv Detail & Related papers (2024-09-08T12:51:34Z) - Full-stack evaluation of Machine Learning inference workloads for RISC-V systems [0.2621434923709917]
This study evaluates the performance of a wide array of machine learning workloads on RISC-V architectures using gem5, an open-source architectural simulator.
Leveraging an open-source compilation toolchain based on Multi-Level Intermediate Representation (MLIR), the research presents benchmarking results specifically focused on deep learning inference workloads.
arXiv Detail & Related papers (2024-05-24T09:24:46Z) - PEFSL: A deployment Pipeline for Embedded Few-Shot Learning on a FPGA SoC [0.0]
We develop an end-to-end open-source pipeline for a few-shot learning platform for object classification on a FPGA system.
We build and deploy a low-power, low-latency demonstrator trained on the MiniImageNet dataset with a dataflow architecture.
The proposed system has a latency of 30 ms while consuming 6.2 W on the PYNQ-Z1 board.
arXiv Detail & Related papers (2024-04-30T08:33:52Z) - Multilayer Environment and Toolchain for Holistic NetwOrk Design and Analysis [2.7763199324745966]
This work analyses in detail the requirements for distributed systems assessment.
Our approach emphasizes setting up and assessing a broader spectrum of distributed systems.
We demonstrate the framework's capabilities to provide valuable insights across various use cases.
arXiv Detail & Related papers (2023-10-24T21:18:25Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - Distributed intelligence on the Edge-to-Cloud Continuum: A systematic
literature review [62.997667081978825]
This review aims at providing a comprehensive vision of the main state-of-the-art libraries and frameworks for machine learning and data analytics available today.
The main simulation, emulation, deployment systems, and testbeds for experimental research on the Edge-to-Cloud Continuum available today are also surveyed.
arXiv Detail & Related papers (2022-04-29T08:06:05Z) - An Extensible Benchmark Suite for Learning to Simulate Physical Systems [60.249111272844374]
We introduce a set of benchmark problems to take a step towards unified benchmarks and evaluation protocols.
We propose four representative physical systems, as well as a collection of both widely used classical time-based and representative data-driven methods.
arXiv Detail & Related papers (2021-08-09T17:39:09Z) - Elastic Architecture Search for Diverse Tasks with Different Resources [87.23061200971912]
We study a new challenging problem of efficient deployment for diverse tasks with different resources, where the resource constraint and task of interest corresponding to a group of classes are dynamically specified at testing time.
Previous NAS approaches seek to design architectures for all classes simultaneously, which may not be optimal for some individual tasks.
We present a novel and general framework, called Elastic Architecture Search (EAS), permitting instant specializations at runtime for diverse tasks with various resource constraints.
arXiv Detail & Related papers (2021-08-03T00:54:27Z) - Integrated Benchmarking and Design for Reproducible and Accessible
Evaluation of Robotic Agents [61.36681529571202]
We describe a new concept for reproducible robotics research that integrates development and benchmarking.
One of the central components of this setup is the Duckietown Autolab, a standardized setup that is itself relatively low-cost and reproducible.
We validate the system by analyzing the repeatability of experiments conducted using the infrastructure and show that there is low variance across different robot hardware and across different remote labs.
arXiv Detail & Related papers (2020-09-09T15:31:29Z) - How to Train Your Super-Net: An Analysis of Training Heuristics in
Weight-Sharing NAS [64.50415611717057]
We show that some commonly-used baselines for super-net training negatively impact the correlation between super-net and stand-alone performance.
Our code and experiments set a strong and reproducible baseline that future works can build on.
arXiv Detail & Related papers (2020-03-09T17:34:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.