The Sparsity Roofline: Understanding the Hardware Limits of Sparse
Neural Networks
- URL: http://arxiv.org/abs/2310.00496v2
- Date: Mon, 6 Nov 2023 19:48:05 GMT
- Title: The Sparsity Roofline: Understanding the Hardware Limits of Sparse
Neural Networks
- Authors: Cameron Shinn, Collin McCarthy, Saurav Muralidharan, Muhammad Osama,
John D. Owens
- Abstract summary: We introduce the Sparsity Roofline, a visual performance model for evaluating sparsity in neural networks.
We show how machine learning researchers can predict the performance of unimplemented or unoptimized block-structured sparsity patterns.
We show how hardware designers can predict the performance implications of new sparsity patterns and sparse data formats in hardware.
- Score: 4.130528857196844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce the Sparsity Roofline, a visual performance model for evaluating
sparsity in neural networks. The Sparsity Roofline jointly models network
accuracy, sparsity, and theoretical inference speedup. Our approach does not
require implementing and benchmarking optimized kernels, and the theoretical
speedup becomes equal to the actual speedup when the corresponding dense and
sparse kernels are well-optimized. We achieve this through a novel analytical
model for predicting sparse network performance, and validate the predicted
speedup using several real-world computer vision architectures pruned across a
range of sparsity patterns and degrees. We demonstrate the utility and
ease-of-use of our model through two case studies: (1) we show how machine
learning researchers can predict the performance of unimplemented or
unoptimized block-structured sparsity patterns, and (2) we show how hardware
designers can predict the performance implications of new sparsity patterns and
sparse data formats in hardware. In both scenarios, the Sparsity Roofline helps
performance experts identify sparsity regimes with the highest performance
potential.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Multi-conditioned Graph Diffusion for Neural Architecture Search [8.290336491323796]
We present a graph diffusion-based NAS approach that uses discrete conditional graph diffusion processes to generate high-performing neural network architectures.
We show promising results on six standard benchmarks, yielding novel and unique architectures at a fast speed.
arXiv Detail & Related papers (2024-03-09T21:45:31Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Generalized Latency Performance Estimation for Once-For-All Neural
Architecture Search [0.0]
We introduce two generalizability strategies which include fine-tuning using a base model trained on a specific hardware and NAS search space.
We provide a family of latency prediction models that achieve over 50% lower RMSE loss as compared to ProxylessNAS.
arXiv Detail & Related papers (2021-01-04T00:48:09Z) - Neural Architecture Optimization with Graph VAE [21.126140965779534]
We propose an efficient NAS approach to optimize network architectures in a continuous space.
The framework jointly learns four components: the encoder, the performance predictor, the complexity predictor and the decoder.
arXiv Detail & Related papers (2020-06-18T07:05:48Z) - FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking.
We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints.
FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.