Computation on Sparse Neural Networks: an Inspiration for Future
Hardware
- URL: http://arxiv.org/abs/2004.11946v1
- Date: Fri, 24 Apr 2020 19:13:50 GMT
- Title: Computation on Sparse Neural Networks: an Inspiration for Future
Hardware
- Authors: Fei Sun, Minghai Qin, Tianyun Zhang, Liu Liu, Yen-Kuang Chen, Yuan Xie
- Abstract summary: We describe the current status of the research on the computation of sparse neural networks.
We discuss the model accuracy influenced by the number of weight parameters and the structure of the model.
We show that for practically complicated problems, it is more beneficial to search large and sparse models in the weight dominated region.
- Score: 20.131626638342706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural network models are widely used in solving many challenging problems,
such as computer vision, personalized recommendation, and natural language
processing. Those models are very computationally intensive and reach the
hardware limit of the existing server and IoT devices. Thus, finding better
model architectures with much less amount of computation while maximally
preserving the accuracy is a popular research topic. Among various mechanisms
that aim to reduce the computation complexity, identifying the zero values in
the model weights and in the activations to avoid computing them is a promising
direction.
In this paper, we summarize the current status of the research on the
computation of sparse neural networks, from the perspective of the sparse
algorithms, the software frameworks, and the hardware accelerations. We observe
that the search for the sparse structure can be a general methodology for
high-quality model explorations, in addition to a strategy for high-efficiency
model execution. We discuss the model accuracy influenced by the number of
weight parameters and the structure of the model. The corresponding models are
called to be located in the weight dominated and structure dominated regions,
respectively. We show that for practically complicated problems, it is more
beneficial to search large and sparse models in the weight dominated region. In
order to achieve the goal, new approaches are required to search for proper
sparse structures, and new sparse training hardware needs to be developed to
facilitate fast iterations of sparse models.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - A method for quantifying the generalization capabilities of generative models for solving Ising models [5.699467840225041]
We use a Hamming distance regularizer to quantify the generalization capabilities of various network architectures combined with VAN.
We conduct numerical experiments on several network architectures combined with VAN, including feed-forward neural networks, recurrent neural networks, and graph neural networks.
Our method is of great significance for assisting in the Neural Architecture Search field of searching for the optimal network architectures when solving large-scale Ising models.
arXiv Detail & Related papers (2024-05-06T12:58:48Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - Load-balanced Gather-scatter Patterns for Sparse Deep Neural Networks [20.374784902476318]
Pruning, as a method to introduce zeros to model weights, has shown to be an effective method to provide good trade-offs between model accuracy and computation efficiency.
Some modern processors are equipped with fast on-chip scratchpad memories and gather/scatter engines that perform indirect load and store operations on such memories.
In this work, we propose a set of novel sparse patterns, named gather-scatter (GS) patterns, to utilize the scratchpad memories and gather/scatter engines to speed up neural network inferences.
arXiv Detail & Related papers (2021-12-20T22:55:45Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Balancing Accuracy and Latency in Multipath Neural Networks [0.09668407688201358]
We use a one-shot neural architecture search model to implicitly evaluate the performance of an intractable number of neural networks.
We show that our method can accurately model the relative performance between models with different latencies and predict the performance of unseen models with good precision across different datasets.
arXiv Detail & Related papers (2021-04-25T00:05:48Z) - The Untapped Potential of Off-the-Shelf Convolutional Neural Networks [29.205446247063673]
We show that existing off-the-shelf models like ResNet-50 are capable of over 95% accuracy on ImageNet.
This level of performance currently exceeds that of models with over 20x more parameters and significantly more complex training procedures.
arXiv Detail & Related papers (2021-03-17T20:04:46Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.