Related papers: OLLA: Decreasing the Memory Usage of Neural Networks by Optimizing the Lifetime and Location of Arrays

OLLA: Decreasing the Memory Usage of Neural Networks by Optimizing the Lifetime and Location of Arrays

URL: http://arxiv.org/abs/2210.12924v1
Date: Mon, 24 Oct 2022 02:39:13 GMT
Title: OLLA: Decreasing the Memory Usage of Neural Networks by Optimizing the Lifetime and Location of Arrays
Authors: Benoit Steiner, Mostafa Elhoushi, Jacob Kahn, James Hegarty
Abstract summary: OLLA is an algorithm that optimize the lifetime and memory location of the tensors used to train neural networks. We present several techniques to simplify the encoding of the problem, and enable our approach to scale to the size of state-of-the-art neural networks.
Score: 6.418232942455968
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The size of deep neural networks has grown exponentially in recent years. Unfortunately, hardware devices have not kept pace with the rapidly increasing memory requirements. To cope with this, researchers have turned to techniques such as spilling and recomputation, which increase training time, or reduced precision and model pruning, which can affect model accuracy. We present OLLA, an algorithm that optimizes the lifetime and memory location of the tensors used to train neural networks. Our method reduces the memory usage of existing neural networks, without needing any modification to the models or their training procedures. We formulate the problem as a joint integer linear program (ILP). We present several techniques to simplify the encoding of the problem, and enable our approach to scale to the size of state-of-the-art neural networks using an off-the-shelf ILP solver. We experimentally demonstrate that OLLA only takes minutes if not seconds to allow the training of neural networks using one-third less memory on average.

Related papers

Efficient Event-based Delay Learning in Spiking Neural Networks [0.1350479308585481]
Spiking Neural Networks (SNNs) are attracting increased attention as an energy-efficient alternative to traditional Neural Networks. We propose a novel event-based training method for SNNs, grounded in the EventPropProp formalism. We show that our approach uses less than half the memory of the current state-of-the-art delay-learning method and is up to 26x faster.
arXiv Detail & Related papers (2025-01-13T13:44:34Z)
Optimal Gradient Checkpointing for Sparse and Recurrent Architectures using Off-Chip Memory [0.8321953606016751]
We introduce memory-efficient gradient checkpointing strategies tailored for the general class of sparse RNNs and Spiking Neural Networks. We find that Double Checkpointing emerges as the most effective method, optimizing the use of local memory resources while minimizing recomputation overhead.
arXiv Detail & Related papers (2024-12-16T14:23:31Z)
RelChaNet: Neural Network Feature Selection using Relative Change Scores [0.0]
We introduce RelChaNet, a novel and lightweight feature selection algorithm that uses neuron pruning and regrowth in the input layer of a dense neural network. Our approach generally outperforms the current state-of-the-art methods, and in particular improves the average accuracy by 2% on the MNIST dataset.
arXiv Detail & Related papers (2024-10-03T09:56:39Z)
Canonic Signed Spike Coding for Efficient Spiking Neural Networks [7.524721345903027]
Spiking Neural Networks (SNNs) seek to mimic the spiking behavior of biological neurons and are expected to play a key role in the advancement of neural computing and artificial intelligence. The conversion of Artificial Neural Networks (ANNs) to SNNs is the most widely used training method, which ensures that the resulting SNNs perform comparably to ANNs on large-scale datasets. Current schemes typically use spike count or timing for encoding, which is linearly related to ANN activations and increases the required number of time steps. We propose a novel Canonic Signed Spike (CSS) coding
arXiv Detail & Related papers (2024-08-30T12:39:25Z)
Verified Neural Compressed Sensing [58.98637799432153]
We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task. We show that for modest problem dimensions (up to 50), we can train neural networks that provably recover a sparse vector from linear and binarized linear measurements. We show that the complexity of the network can be adapted to the problem difficulty and solve problems where traditional compressed sensing methods are not known to provably work.
arXiv Detail & Related papers (2024-05-07T12:20:12Z)
On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z)
Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x. We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z)
Neural Network Optimization for Reinforcement Learning Tasks Using Sparse Computations [3.4328283704703866]
This article proposes a sparse computation-based method for optimizing neural networks for reinforcement learning tasks. It significantly reduces the number of multiplications when running neural networks.
arXiv Detail & Related papers (2022-01-07T18:09:23Z)
Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware. Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks. We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z)
Dynamic Neural Diversification: Path to Computationally Sustainable Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks. We explore the diversity of the neurons within the hidden layer during the learning process. We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z)
Binary Neural Networks for Memory-Efficient and Effective Visual Place Recognition in Changing Environments [24.674034243725455]
Visual place recognition (VPR) is a robot's ability to determine whether a place was visited before using visual data. CNN-based approaches are unsuitable for resource-constrained platforms, such as small robots and drones. We propose a new class of highly compact models that drastically reduces the memory requirements and computational effort.
arXiv Detail & Related papers (2020-10-01T22:59:34Z)
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces. We train and validate our approach directly on the Intel NNP-I chip for inference. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
Lossless Compression of Deep Neural Networks [17.753357839478575]
Deep neural networks have been successful in many predictive modeling tasks, such as image and language recognition. It is challenging to deploy these networks under limited computational resources, such as in mobile devices. We introduce an algorithm that removes units and layers of a neural network while not changing the output that is produced.
arXiv Detail & Related papers (2020-01-01T15:04:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.