YFlows: Systematic Dataflow Exploration and Code Generation for
Efficient Neural Network Inference using SIMD Architectures on CPUs
- URL: http://arxiv.org/abs/2310.00574v3
- Date: Thu, 23 Nov 2023 16:24:39 GMT
- Title: YFlows: Systematic Dataflow Exploration and Code Generation for
Efficient Neural Network Inference using SIMD Architectures on CPUs
- Authors: Cyrus Zhou, Zack Hassman, Ruize Xu, Dhirpal Shah, Vaugnn Richard,
Yanjing Li
- Abstract summary: We address the challenges associated with deploying neural networks on CPUs.
Our novel approach is to use the dataflow of a neural network to explore data reuse opportunities.
Our results show that the dataflow that keeps outputs in SIMD registers consistently yields the best performance.
- Score: 3.1445034800095413
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the challenges associated with deploying neural networks on CPUs,
with a particular focus on minimizing inference time while maintaining
accuracy. Our novel approach is to use the dataflow (i.e., computation order)
of a neural network to explore data reuse opportunities using heuristic-guided
analysis and a code generation framework, which enables exploration of various
Single Instruction, Multiple Data (SIMD) implementations to achieve optimized
neural network execution. Our results demonstrate that the dataflow that keeps
outputs in SIMD registers while also maximizing both input and weight reuse
consistently yields the best performance for a wide variety of inference
workloads, achieving up to 3x speedup for 8-bit neural networks, and up to 4.8x
speedup for binary neural networks, respectively, over the optimized
implementations of neural networks today.
Related papers
- DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort.
DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives.
For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z) - SparseProp: Efficient Event-Based Simulation and Training of Sparse
Recurrent Spiking Neural Networks [4.532517021515834]
Spiking Neural Networks (SNNs) are biologically-inspired models that are capable of processing information in streams of action potentials.
We introduce SparseProp, a novel event-based algorithm for simulating and training sparse SNNs.
arXiv Detail & Related papers (2023-12-28T18:48:10Z) - Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network [20.05283214295881]
Spiking neural networks (SNNs) are emerging as promising solutions for processing dynamic, asynchronous signals from event-based sensors.
We develop an efficient spiking encoder-decoder network (SpikingEDN) for large-scale event-based semantic segmentation tasks.
We harness the adaptive threshold which improves network accuracy, sparsity and robustness in streaming inference.
arXiv Detail & Related papers (2023-04-24T07:12:50Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded
Systems [0.0]
A Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN) widely used in the analysis of visual images captured by an image sensor.
In this paper, we propose a neoteric variant of deep convolutional neural network architecture to ameliorate the performance of existing CNN architectures for real-time inference on embedded systems.
arXiv Detail & Related papers (2021-12-01T18:20:52Z) - Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Binarized Graph Neural Network [65.20589262811677]
We develop a binarized graph neural network to learn the binary representations of the nodes with binary network parameters.
Our proposed method can be seamlessly integrated into the existing GNN-based embedding approaches.
Experiments indicate that the proposed binarized graph neural network, namely BGN, is orders of magnitude more efficient in terms of both time and space.
arXiv Detail & Related papers (2020-04-19T09:43:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.