Analysis of memory consumption by neural networks based on
hyperparameters
- URL: http://arxiv.org/abs/2110.11424v1
- Date: Thu, 21 Oct 2021 18:49:44 GMT
- Title: Analysis of memory consumption by neural networks based on
hyperparameters
- Authors: Mahendran N
- Abstract summary: We propose a generic analysis of memory consumption while training deep learning models.
The change in hyperparamaters and the number of hidden layers are the variables considered in this proposed approach.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep learning models are trained and deployed in multiple domains. Increasing
usage of deep learning models alarms the usage of memory consumed while
computation by deep learning models. Existing approaches for reducing memory
consumption like model compression, hardware changes are specific. We propose a
generic analysis of memory consumption while training deep learning models in
comparison with hyperparameters used for training. Hyperparameters which
includes the learning rate, batchsize, number of hidden layers and depth of
layers decide the model performance, accuracy of the model. We assume the
optimizers and type of hidden layers as a known values. The change in
hyperparamaters and the number of hidden layers are the variables considered in
this proposed approach. For better understanding of the computation cost, this
proposed analysis studies the change in memory consumption with respect to
hyperparameters as main focus. This results in general analysis of memory
consumption changes during training when set of hyperparameters are altered.
Related papers
- Memory Layers at Scale [67.00854080570979]
This work takes memory layers beyond proof-of-concept, proving their utility at contemporary scale.
On downstream tasks, language models augmented with our improved memory layer outperform dense models with more than twice the budget, as well as mixture-of-expert models when matched for both compute and parameters.
We provide a fully parallelizable memory layer implementation, demonstrating scaling laws with up to 128B memory parameters, pretrained to 1 trillion tokens, comparing to base models with up to 8B parameters.
arXiv Detail & Related papers (2024-12-12T23:56:57Z) - Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning [64.93848182403116]
Current deep-learning memory models struggle in reinforcement learning environments that are partially observable and long-term.
We introduce the Stable Hadamard Memory, a novel memory model for reinforcement learning agents.
Our approach significantly outperforms state-of-the-art memory-based methods on challenging partially observable benchmarks.
arXiv Detail & Related papers (2024-10-14T03:50:17Z) - Replacement Learning: Training Vision Tasks with Fewer Learnable Parameters [4.2114456503277315]
Replacement Learning replaces all parameters of frozen layers with only two learnable parameters.
We conduct experiments across four benchmark datasets, including CIFAR-10, STL-10, SVHN, and ImageNet.
Our approach reduces the number of parameters, training time, and memory consumption while completely surpassing the performance of end-to-end training.
arXiv Detail & Related papers (2024-10-02T05:03:54Z) - Lowering PyTorch's Memory Consumption for Selective Differentiation [2.424775261485421]
PyTorch's current AD implementation neglects information about parameter differentiability when storing the graph.
We provide a drop-in, differentiability-agnostic implementation of such layers and demonstrate its ability to reduce memory without affecting run time.
arXiv Detail & Related papers (2024-04-15T22:53:30Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental
Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement.
We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work.
We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z) - Training Neural Networks with Fixed Sparse Masks [19.58969772430058]
Recent work has shown that it is possible to update only a small subset of the model's parameters during training.
We show that it is possible to induce a fixed sparse mask on the model's parameters that selects a subset to update over many iterations.
arXiv Detail & Related papers (2021-11-18T18:06:01Z) - Representation Memorization for Fast Learning New Knowledge without
Forgetting [36.55736909586313]
The ability to quickly learn new knowledge is a big step towards human-level intelligence.
We consider scenarios that require learning new classes or data distributions quickly and incrementally over time.
We propose "Memory-based Hebbian Adaptation" to tackle the two major challenges.
arXiv Detail & Related papers (2021-08-28T07:54:53Z) - Improving Computational Efficiency in Visual Reinforcement Learning via
Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER)
SEER is a simple modification of existing off-policy deep reinforcement learning methods.
We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.