KAPLA: Pragmatic Representation and Fast Solving of Scalable NN
Accelerator Dataflow
- URL: http://arxiv.org/abs/2306.15676v1
- Date: Fri, 9 Jun 2023 03:12:42 GMT
- Title: KAPLA: Pragmatic Representation and Fast Solving of Scalable NN
Accelerator Dataflow
- Authors: Zhiyao Li (1), Mingyu Gao (1) ((1) Tsinghua University)
- Abstract summary: We build a generic, optimized, and fast dataflow solver, KAPLA, to explore the design space with effective validity check and efficiency estimation.
KAPLA achieves within only 2.2% and 7.7% energy overheads on the result dataflow for training and inference.
It also outperforms random and machine-learning-based approaches, with more optimized results and orders of magnitude faster search speedup.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataflow scheduling decisions are of vital importance to neural network (NN)
accelerators. Recent scalable NN accelerators support a rich set of advanced
dataflow techniques. The problems of comprehensively representing and quickly
finding optimized dataflow schemes thus become significantly more complicated
and challenging. In this work, we first propose comprehensive and pragmatic
dataflow representations for temporal and spatial scheduling on scalable
multi-node NN architectures. An informal hierarchical taxonomy highlights the
tight coupling across different levels of the dataflow space as the major
difficulty for fast design exploration. A set of formal tensor-centric
directives accurately express various inter-layer and intra-layer schemes, and
allow for quickly determining their validity and efficiency. We then build a
generic, optimized, and fast dataflow solver, KAPLA, which makes use of the
pragmatic directives to explore the design space with effective validity check
and efficiency estimation. KAPLA decouples the upper inter-layer level for fast
pruning, and solves the lower intra-layer schemes with a novel bottom-up cost
descending method. KAPLA achieves within only 2.2% and 7.7% energy overheads on
the result dataflow for training and inference, respectively, compared to the
exhaustively searched optimal schemes. It also outperforms random and
machine-learning-based approaches, with more optimized results and orders of
magnitude faster search speedup.
Related papers
- DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort.
DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives.
For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z) - Towards Hyperparameter-Agnostic DNN Training via Dynamical System
Insights [4.513581513983453]
We present a first-order optimization method specialized for deep neural networks (DNNs), ECCO-DNN.
This method models the optimization variable trajectory as a dynamical system and develops a discretization algorithm that adaptively selects step sizes based on the trajectory's shape.
arXiv Detail & Related papers (2023-10-21T03:45:13Z) - Accelerating Scalable Graph Neural Network Inference with Node-Adaptive
Propagation [80.227864832092]
Graph neural networks (GNNs) have exhibited exceptional efficacy in a diverse array of applications.
The sheer size of large-scale graphs presents a significant challenge to real-time inference with GNNs.
We propose an online propagation framework and two novel node-adaptive propagation methods.
arXiv Detail & Related papers (2023-10-17T05:03:00Z) - Efficient Graph Neural Network Inference at Large Scale [54.89457550773165]
Graph neural networks (GNNs) have demonstrated excellent performance in a wide range of applications.
Existing scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure.
We propose a novel adaptive propagation order approach that generates the personalized propagation order for each node based on its topological information.
arXiv Detail & Related papers (2022-11-01T14:38:18Z) - Correlating sparse sensing for large-scale traffic speed estimation: A
Laplacian-enhanced low-rank tensor kriging approach [76.45949280328838]
We propose a Laplacian enhanced low-rank tensor (LETC) framework featuring both lowrankness and multi-temporal correlations for large-scale traffic speed kriging.
We then design an efficient solution algorithm via several effective numeric techniques to scale up the proposed model to network-wide kriging.
arXiv Detail & Related papers (2022-10-21T07:25:57Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - JUMBO: Scalable Multi-task Bayesian Optimization using Offline Data [86.8949732640035]
We propose JUMBO, an MBO algorithm that sidesteps limitations by querying additional data.
We show that it achieves no-regret under conditions analogous to GP-UCB.
Empirically, we demonstrate significant performance improvements over existing approaches on two real-world optimization problems.
arXiv Detail & Related papers (2021-06-02T05:03:38Z) - CoSA: Scheduling by Constrained Optimization for Spatial Accelerators [1.9149970150912705]
We present CoSA, a constrained-optimization-based approach for scheduling Deep Neural Networks (DNNs) accelerators.
As opposed to existing approaches that either rely on designers's or iterative methods to navigate the search space, CoSA expresses scheduling decisions as a constrained-optimization problem.
We demonstrate that CoSA-generated schedules significantly outperform state-of-the-art approaches by a geometric mean of up to 2.5x.
arXiv Detail & Related papers (2021-05-05T07:17:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.