Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling
- URL: http://arxiv.org/abs/2205.05826v1
- Date: Thu, 12 May 2022 01:28:03 GMT
- Title: Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling
- Authors: Yannan Nellie Wu, Po-An Tsai, Angshuman Parashar, Vivienne Sze, Joel
S. Emer
- Abstract summary: This paper first presents a unified taxonomy to systematically describe the diverse sparse tensor accelerator design space.
Based on the proposed taxonomy, it then introduces Sparseloop, the first fast, accurate, and flexible analytical modeling framework.
Sparseloop comprehends a large set of architecture specifications, including various dataflows and sparse acceleration features.
- Score: 10.610523739702971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, many accelerators have been proposed to efficiently process
sparse tensor algebra applications (e.g., sparse neural networks). However,
these proposals are single points in a large and diverse design space. The lack
of systematic description and modeling support for these sparse tensor
accelerators impedes hardware designers from efficient and effective design
space exploration. This paper first presents a unified taxonomy to
systematically describe the diverse sparse tensor accelerator design space.
Based on the proposed taxonomy, it then introduces Sparseloop, the first fast,
accurate, and flexible analytical modeling framework to enable early-stage
evaluation and exploration of sparse tensor accelerators. Sparseloop
comprehends a large set of architecture specifications, including various
dataflows and sparse acceleration features (e.g., elimination of zero-based
compute). Using these specifications, Sparseloop evaluates a design's
processing speed and energy efficiency while accounting for data movement and
compute incurred by the employed dataflow as well as the savings and overhead
introduced by the sparse acceleration features using stochastic tensor density
models. Across representative accelerators and workloads, Sparseloop achieves
over 2000 times faster modeling speed than cycle-level simulations, maintains
relative performance trends, and achieves 0.1% to 8% average error. With a case
study, we demonstrate Sparseloop's ability to help reveal important insights
for designing sparse tensor accelerators (e.g., it is important to co-design
orthogonal design aspects).
Related papers
- Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators [33.18173790144853]
We present an automated generation approach for fast performance models to accurately estimate the latency of a Deep Neural Networks (DNNs)
We modeled representative DNN accelerators such as Gemmini, UltraTrail, Plasticine-derived, and a parameterizable systolic array.
We evaluate only 154 loop kernel iterations to estimate the performance for 4.19 billion instructions achieving a significant speedup.
arXiv Detail & Related papers (2024-09-13T07:27:55Z) - KAPLA: Pragmatic Representation and Fast Solving of Scalable NN
Accelerator Dataflow [0.0]
We build a generic, optimized, and fast dataflow solver, KAPLA, to explore the design space with effective validity check and efficiency estimation.
KAPLA achieves within only 2.2% and 7.7% energy overheads on the result dataflow for training and inference.
It also outperforms random and machine-learning-based approaches, with more optimized results and orders of magnitude faster search speedup.
arXiv Detail & Related papers (2023-06-09T03:12:42Z) - Performance Embeddings: A Similarity-based Approach to Automatic
Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications.
We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z) - Efficient Graph Neural Network Inference at Large Scale [54.89457550773165]
Graph neural networks (GNNs) have demonstrated excellent performance in a wide range of applications.
Existing scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure.
We propose a novel adaptive propagation order approach that generates the personalized propagation order for each node based on its topological information.
arXiv Detail & Related papers (2022-11-01T14:38:18Z) - Correlating sparse sensing for large-scale traffic speed estimation: A
Laplacian-enhanced low-rank tensor kriging approach [76.45949280328838]
We propose a Laplacian enhanced low-rank tensor (LETC) framework featuring both lowrankness and multi-temporal correlations for large-scale traffic speed kriging.
We then design an efficient solution algorithm via several effective numeric techniques to scale up the proposed model to network-wide kriging.
arXiv Detail & Related papers (2022-10-21T07:25:57Z) - FaDIn: Fast Discretized Inference for Hawkes Processes with General
Parametric Kernels [82.53569355337586]
This work offers an efficient solution to temporal point processes inference using general parametric kernels with finite support.
The method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG)
Results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
arXiv Detail & Related papers (2022-10-10T12:35:02Z) - Truncated tensor Schatten p-norm based approach for spatiotemporal
traffic data imputation with complicated missing patterns [77.34726150561087]
We introduce four complicated missing patterns, including missing and three fiber-like missing cases according to the mode-drivenn fibers.
Despite nonity of the objective function in our model, we derive the optimal solutions by integrating alternating data-mputation method of multipliers.
arXiv Detail & Related papers (2022-05-19T08:37:56Z) - Data-Driven Offline Optimization For Architecting Hardware Accelerators [89.68870139177785]
We develop a data-driven offline optimization method for designing hardware accelerators, dubbed PRIME.
PRIME improves performance upon state-of-the-art simulation-driven methods by about 1.54x and 1.20x, while considerably reducing the required total simulation time by 93% and 99%, respectively.
In addition, PRIME also architects effective accelerators for unseen applications in a zero-shot setting, outperforming simulation-based methods by 1.26x.
arXiv Detail & Related papers (2021-10-20T17:06:09Z) - Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network
Training [0.5219568203653523]
We develop a sparse DNN training accelerator that produces pruned models with the same accuracy as dense models without first training, then pruning, and finally retraining, a dense model.
Compared to training the equivalent unpruned models using a state-of-the-art DNN accelerator without sparse training support, Procrustes consumes up to 3.26$times$ less energy and offers up to 4$times$ speedup across a range of models, while pruning weights by an order of magnitude and maintaining unpruned accuracy.
arXiv Detail & Related papers (2020-09-23T07:39:55Z) - Hardware Acceleration of Sparse and Irregular Tensor Computations of ML
Models: A Survey and Insights [18.04657939198617]
This paper provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of machine learning models on hardware accelerators.
It analyzes different hardware designs and acceleration techniques and analyzes them in terms of hardware and execution costs.
The takeaways from this paper include: understanding the key challenges in accelerating sparse, irregular-shaped, and quantized tensors.
arXiv Detail & Related papers (2020-07-02T04:08:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.