CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
- URL: http://arxiv.org/abs/2105.01898v1
- Date: Wed, 5 May 2021 07:17:25 GMT
- Title: CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
- Authors: Qijing Huang, Minwoo Kang, Grace Dinh, Thomas Norell, Aravind Kalaiah,
James Demmel, John Wawrzynek, Yakun Sophia Shao
- Abstract summary: We present CoSA, a constrained-optimization-based approach for scheduling Deep Neural Networks (DNNs) accelerators.
As opposed to existing approaches that either rely on designers's or iterative methods to navigate the search space, CoSA expresses scheduling decisions as a constrained-optimization problem.
We demonstrate that CoSA-generated schedules significantly outperform state-of-the-art approaches by a geometric mean of up to 2.5x.
- Score: 1.9149970150912705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in Deep Neural Networks (DNNs) have led to active development
of specialized DNN accelerators, many of which feature a large number of
processing elements laid out spatially, together with a multi-level memory
hierarchy and flexible interconnect. While DNN accelerators can take advantage
of data reuse and achieve high peak throughput, they also expose a large number
of runtime parameters to the programmers who need to explicitly manage how
computation is scheduled both spatially and temporally. In fact, different
scheduling choices can lead to wide variations in performance and efficiency,
motivating the need for a fast and efficient search strategy to navigate the
vast scheduling space.
To address this challenge, we present CoSA, a constrained-optimization-based
approach for scheduling DNN accelerators. As opposed to existing approaches
that either rely on designers' heuristics or iterative methods to navigate the
search space, CoSA expresses scheduling decisions as a constrained-optimization
problem that can be deterministically solved using mathematical optimization
techniques. Specifically, CoSA leverages the regularities in DNN operators and
hardware to formulate the DNN scheduling space into a mixed-integer programming
(MIP) problem with algorithmic and architectural constraints, which can be
solved to automatically generate a highly efficient schedule in one shot. We
demonstrate that CoSA-generated schedules significantly outperform
state-of-the-art approaches by a geometric mean of up to 2.5x across a wide
range of DNN networks while improving the time-to-solution by 90x.
Related papers
- Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers [58.5711048151424]
We introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome computational and memory obstacles.
Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query.
Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods.
arXiv Detail & Related papers (2024-06-24T15:55:59Z) - DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing.
Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time.
We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z) - LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization [48.41286573672824]
Spiking Neural Networks (SNNs) mimic the information-processing mechanisms of the human brain and are highly energy-efficient.
We propose a new approach named LitE-SNN that incorporates both spatial and temporal compression into the automated network design process.
arXiv Detail & Related papers (2024-01-26T05:23:11Z) - Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - A Fast Task Offloading Optimization Framework for IRS-Assisted
Multi-Access Edge Computing System [14.82292289994152]
We propose a deep learning-based optimization framework called Iterative Order-Preserving policy Optimization (IOPO)
IOPO enables the generation of energy-efficient task-offloading decisions within milliseconds.
Experimental results demonstrate that the proposed framework can generate energy-efficient task-offloading decisions within a very short time period.
arXiv Detail & Related papers (2023-07-17T13:32:02Z) - KAPLA: Pragmatic Representation and Fast Solving of Scalable NN
Accelerator Dataflow [0.0]
We build a generic, optimized, and fast dataflow solver, KAPLA, to explore the design space with effective validity check and efficiency estimation.
KAPLA achieves within only 2.2% and 7.7% energy overheads on the result dataflow for training and inference.
It also outperforms random and machine-learning-based approaches, with more optimized results and orders of magnitude faster search speedup.
arXiv Detail & Related papers (2023-06-09T03:12:42Z) - RESPECT: Reinforcement Learning based Edge Scheduling on Pipelined Coral
Edge TPUs [12.952987240366781]
This work presents a reinforcement learning (RL) based scheduling framework, which learns the behaviors of optimal optimization algorithms.
RL generates near-optimal scheduling results with short solving runtime overhead.
Our framework has demonstrated up to $sim2.5times$ real-world on-chip runtime inference speedups over the commercial compiler.
arXiv Detail & Related papers (2023-04-10T17:22:12Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - Automated Design Space Exploration for optimised Deployment of DNN on
Arm Cortex-A CPUs [13.628734116014819]
Deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN)
There is a lack of research on cross-level optimisation as the space of approaches becomes too large to test and obtain a globally optimised solution.
We present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to 4x improvement in performance and over 2x reduction in memory.
arXiv Detail & Related papers (2020-06-09T11:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.