Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural
Networks for Edge Devices
- URL: http://arxiv.org/abs/2003.02369v1
- Date: Wed, 4 Mar 2020 23:38:54 GMT
- Title: Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural
Networks for Edge Devices
- Authors: Byung Hoon Ahn, Jinwon Lee, Jamie Menjay Lin, Hsin-Pai Cheng, Jilei
Hou, Hadi Esmaeilzadeh
- Abstract summary: We present a memory-aware compiler, dubbed SERENITY, that finds a sequence that finds a schedule with optimal memory footprint.
Our solution also comprises of graph rewriting technique that allows further reduction beyond the optimum.
- Score: 10.876317610988059
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances demonstrate that irregularly wired neural networks from
Neural Architecture Search (NAS) and Random Wiring can not only automate the
design of deep neural networks but also emit models that outperform previous
manual designs. These designs are especially effective while designing neural
architectures under hard resource constraints (memory, MACs, . . . ) which
highlights the importance of this class of designing neural networks. However,
such a move creates complication in the previously streamlined pattern of
execution. In fact one of the main challenges is that the order of such nodes
in the neural network significantly effects the memory footprint of the
intermediate activations. Current compilers do not schedule with regard to
activation memory footprint that it significantly increases its peak compared
to the optimum, rendering it not applicable for edge devices. To address this
standing issue, we present a memory-aware compiler, dubbed SERENITY, that
utilizes dynamic programming to find a sequence that finds a schedule with
optimal memory footprint. Our solution also comprises of graph rewriting
technique that allows further reduction beyond the optimum. As such, SERENITY
achieves optimal peak memory, and the graph rewriting technique further
improves this resulting in 1.68x improvement with dynamic programming-based
scheduler and 1.86x with graph rewriting, against TensorFlow Lite with less
than one minute overhead.
Related papers
- Memory-aware Scheduling for Complex Wired Networks with Iterative Graph
Optimization [4.614780125575351]
We propose an efficient memory-aware scheduling framework based on iterative graph optimization.
Our framework features an iterative graph fusion algorithm that simplifies the graph while preserving the scheduling optimality.
arXiv Detail & Related papers (2023-08-26T14:52:02Z) - Towards Zero Memory Footprint Spiking Neural Network Training [7.4331790419913455]
Spiking Neural Networks (SNNs) process information using discrete-time events known as spikes rather than continuous values.
In this paper, we introduce an innovative framework characterized by a remarkably low memory footprint.
Our design is able to achieve a $mathbf58.65times$ reduction in memory usage compared to the current SNN node.
arXiv Detail & Related papers (2023-08-16T19:49:24Z) - MF-NeRF: Memory Efficient NeRF with Mixed-Feature Hash Table [62.164549651134465]
We propose MF-NeRF, a memory-efficient NeRF framework that employs a Mixed-Feature hash table to improve memory efficiency and reduce training time while maintaining reconstruction quality.
Our experiments with state-of-the-art Instant-NGP, TensoRF, and DVGO, indicate our MF-NeRF could achieve the fastest training time on the same GPU hardware with similar or even higher reconstruction quality.
arXiv Detail & Related papers (2023-04-25T05:44:50Z) - OLLA: Decreasing the Memory Usage of Neural Networks by Optimizing the
Lifetime and Location of Arrays [6.418232942455968]
OLLA is an algorithm that optimize the lifetime and memory location of the tensors used to train neural networks.
We present several techniques to simplify the encoding of the problem, and enable our approach to scale to the size of state-of-the-art neural networks.
arXiv Detail & Related papers (2022-10-24T02:39:13Z) - Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x.
We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z) - Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [67.33850633281803]
We present a versatile new input encoding that permits the use of a smaller network without sacrificing quality.
A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through a gradient descent.
We achieve a combined speed of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds.
arXiv Detail & Related papers (2022-01-16T07:22:47Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.