Moccasin: Efficient Tensor Rematerialization for Neural Networks
- URL: http://arxiv.org/abs/2304.14463v2
- Date: Tue, 30 May 2023 18:12:29 GMT
- Title: Moccasin: Efficient Tensor Rematerialization for Neural Networks
- Authors: Burak Bartan, Haoming Li, Harris Teague, Christopher Lott, Bistra
Dilkina
- Abstract summary: We develop a new constraint programming formulation called textscMoccasin with only $O(n)$ integer variables.
We present numerical studies that show that our approach is up to an order of magnitude faster than recent work especially for large-scale graphs.
- Score: 21.348126106786914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The deployment and training of neural networks on edge computing devices pose
many challenges. The low memory nature of edge devices is often one of the
biggest limiting factors encountered in the deployment of large neural network
models. Tensor rematerialization or recompute is a way to address high memory
requirements for neural network training and inference. In this paper we
consider the problem of execution time minimization of compute graphs subject
to a memory budget. In particular, we develop a new constraint programming
formulation called \textsc{Moccasin} with only $O(n)$ integer variables, where
$n$ is the number of nodes in the compute graph. This is a significant
improvement over the works in the recent literature that propose formulations
with $O(n^2)$ Boolean variables. We present numerical studies that show that
our approach is up to an order of magnitude faster than recent work especially
for large-scale graphs.
Related papers
- NodeFormer: A Scalable Graph Structure Learning Transformer for Node
Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes.
The efficient computation is enabled by a kernerlized Gumbel-Softmax operator.
Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z) - Bypass Exponential Time Preprocessing: Fast Neural Network Training via
Weight-Data Correlation Preprocessing [16.35997749365563]
State-of-art deep neural networks are becoming larger in size every year to deliver increasing model accuracy.
Recent work [Song, Yang and Zhang, NeurIPS 2021] reduces this time per iteration to $o(nmd)$, but requires exponential time to preprocess either the data or the neural network weights.
We present a new preprocessing method that simply stores the weight-data correlation in a tree data structure in order to quickly, dynamically detect which neurons fire at each iteration.
arXiv Detail & Related papers (2022-11-25T16:40:49Z) - OLLA: Decreasing the Memory Usage of Neural Networks by Optimizing the
Lifetime and Location of Arrays [6.418232942455968]
OLLA is an algorithm that optimize the lifetime and memory location of the tensors used to train neural networks.
We present several techniques to simplify the encoding of the problem, and enable our approach to scale to the size of state-of-the-art neural networks.
arXiv Detail & Related papers (2022-10-24T02:39:13Z) - Training Overparametrized Neural Networks in Sublinear Time [14.918404733024332]
Deep learning comes at a tremendous computational and energy cost.
We present a new and a subset of binary neural networks, as a small subset of search trees, where each corresponds to a subset of search trees (Ds)
We believe this view would have further applications in analysis analysis of deep networks (Ds)
arXiv Detail & Related papers (2022-08-09T02:29:42Z) - Graph Neural Network Bandits [89.31889875864599]
We consider the bandit optimization problem with the reward function defined over graph-structured data.
Key challenges in this setting are scaling to large domains, and to graphs with many nodes.
We show that graph neural networks (GNNs) can be used to estimate the reward function.
arXiv Detail & Related papers (2022-07-13T18:12:36Z) - Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x.
We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Does Preprocessing Help Training Over-parameterized Neural Networks? [19.64638346701198]
We propose two novel preprocessing ideas to bypass the $Omega(mnd)$ barrier.
Our results provide theoretical insights for a large number of previously established fast training methods.
arXiv Detail & Related papers (2021-10-09T18:16:23Z) - ReLU Neural Networks of Polynomial Size for Exact Maximum Flow Computation [5.35599092568615]
This paper studies the power of artificial neural networks with rectified linear units.
We show that two fundamental optimization problems can be solved with neural networks of size $mathcalO(m2n2)$.
arXiv Detail & Related papers (2021-02-12T17:23:34Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Deep Polynomial Neural Networks [77.70761658507507]
$Pi$Nets are a new class of function approximators based on expansions.
$Pi$Nets produce state-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
arXiv Detail & Related papers (2020-06-20T16:23:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.