MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive
Multi-Accelerator Systems
- URL: http://arxiv.org/abs/2307.12234v1
- Date: Sun, 23 Jul 2023 05:50:37 GMT
- Title: MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive
Multi-Accelerator Systems
- Authors: Guan Shen, Jieru Zhao, Zeke Wang, Zhe Lin, Wenchao Ding, Chentao Wu,
Quan Chen, Minyi Guo
- Abstract summary: We propose a novel mapping framework that can perform computation-aware accelerator selection and apply communication-aware sharding strategies to maximize parallelism.
We show that MARS can achieve 32.2% latency reduction on average for typical DNN workloads compared to the baseline, and 59.4% latency reduction on heterogeneous models compared to the corresponding state-of-the-art method.
- Score: 27.490645446510033
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Along with the fast evolution of deep neural networks, the hardware system is
also developing rapidly. As a promising solution achieving high scalability and
low manufacturing cost, multi-accelerator systems widely exist in data centers,
cloud platforms, and SoCs. Thus, a challenging problem arises in
multi-accelerator systems: selecting a proper combination of accelerators from
available designs and searching for efficient DNN mapping strategies. To this
end, we propose MARS, a novel mapping framework that can perform
computation-aware accelerator selection, and apply communication-aware sharding
strategies to maximize parallelism. Experimental results show that MARS can
achieve 32.2% latency reduction on average for typical DNN workloads compared
to the baseline, and 59.4% latency reduction on heterogeneous models compared
to the corresponding state-of-the-art method.
Related papers
- Characterizing Speed Performance of Multi-Agent Reinforcement Learning [5.313762764969945]
Multi-Agent Reinforcement Learning (MARL) has achieved significant success in large-scale AI systems and big-data applications such as smart grids, surveillance, etc.
Existing advancements in MARL algorithms focus on improving the rewards obtained by introducing various mechanisms for inter-agent cooperation.
We analyze the speed performance (i.e., latency-bounded throughput) as the key metric in MARL implementations.
arXiv Detail & Related papers (2023-09-13T17:26:36Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Precision-aware Latency and Energy Balancing on Multi-Accelerator
Platforms for DNN Inference [22.9834921448069]
We propose ODiMO, a hardware-aware tool that performs a fine-grain mapping across different accelerators on-chip.
We show that ODiMO reduces energy/latency by up to 33%/31% with limited accuracy drop (-0.53%/-0.32%) compared to manual mappings.
arXiv Detail & Related papers (2023-06-08T09:23:46Z) - Network-Aided Intelligent Traffic Steering in 6G O-RAN: A Multi-Layer
Optimization Framework [47.57576667752444]
We jointly optimize the flow-split distribution, congestion control and scheduling (JFCS) to enable an intelligent steering application in open RAN (O-RAN)
Our main contributions are three-fold: i) we propose the novel JFCS framework to efficiently and adaptively direct traffic to appropriate radio units; ii) we develop low-complexity algorithms based on the reinforcement learning, inner approximation and bisection search methods to effectively solve the JFCS problem in different time scales; and iv) the rigorous theoretical performance results are analyzed to show that there exists a scaling factor to improve the tradeoff between delay and utility-optimization
arXiv Detail & Related papers (2023-02-06T11:37:06Z) - Collaborative Intelligent Reflecting Surface Networks with Multi-Agent
Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks.
In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge
Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC)
We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer.
Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z) - CoSA: Scheduling by Constrained Optimization for Spatial Accelerators [1.9149970150912705]
We present CoSA, a constrained-optimization-based approach for scheduling Deep Neural Networks (DNNs) accelerators.
As opposed to existing approaches that either rely on designers's or iterative methods to navigate the search space, CoSA expresses scheduling decisions as a constrained-optimization problem.
We demonstrate that CoSA-generated schedules significantly outperform state-of-the-art approaches by a geometric mean of up to 2.5x.
arXiv Detail & Related papers (2021-05-05T07:17:25Z) - Distributed Multi-agent Meta Learning for Trajectory Design in Wireless
Drone Networks [151.27147513363502]
This paper studies the problem of the trajectory design for a group of energyconstrained drones operating in dynamic wireless network environments.
A value based reinforcement learning (VDRL) solution and a metatraining mechanism is proposed.
arXiv Detail & Related papers (2020-12-06T01:30:12Z) - DNA: Differentiable Network-Accelerator Co-Search [36.68587348474986]
We propose DNA, a Differentiable Network-Accelerator co-search framework for automatically searching for matched networks and accelerators.
Specifically, DNA integrates two enablers: (1) a generic design space for DNN accelerators and compatible with DNN frameworks such as PyTorch to enable algorithmic exploration.
Experiments and ablation studies show that the matched networks and accelerators generated by DNA consistently outperform state-of-the-art (SOTA) DNNs and accelerators.
arXiv Detail & Related papers (2020-10-28T05:57:16Z) - HAPI: Hardware-Aware Progressive Inference [18.214367595727037]
Convolutional neural networks (CNNs) have recently become the state-of-the-art in a diversity of AI tasks.
Despite their popularity, CNN inference still comes at a high computational cost.
This work presents HAPI, a novel methodology for generating high-performance early-exit networks.
arXiv Detail & Related papers (2020-08-10T09:55:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.