Related papers: Exploring Low Rank Training of Deep Neural Networks

Exploring Low Rank Training of Deep Neural Networks

URL: http://arxiv.org/abs/2209.13569v1
Date: Tue, 27 Sep 2022 17:43:45 GMT
Title: Exploring Low Rank Training of Deep Neural Networks
Authors: Siddhartha Rao Kamalakara, Acyr Locatelli, Bharat Venkitesh, Jimmy Ba, Yarin Gal, Aidan N. Gomez
Abstract summary: Training deep neural networks in low rank offers efficiency over unfactorised training in terms of both memory consumption and training time. We analyse techniques that work well in practice, and through extensive ablations on models such as GPT2 we provide evidence falsifying common beliefs in the field.
Score: 49.18122605463354
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training deep neural networks in low rank, i.e. with factorised layers, is of particular interest to the community: it offers efficiency over unfactorised training in terms of both memory consumption and training time. Prior work has focused on low rank approximations of pre-trained networks and training in low rank space with additional objectives, offering various ad hoc explanations for chosen practice. We analyse techniques that work well in practice, and through extensive ablations on models such as GPT2 we provide evidence falsifying common beliefs in the field, hinting in the process at exciting research opportunities that still need answering.

Related papers

Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks. We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z)
Deep Fusion: Efficient Network Training via Pre-trained Initializations [3.9146761527401424]
We present Deep Fusion, an efficient approach to network training that leverages pre-trained initializations of smaller networks. Our experiments show how Deep Fusion is a practical and effective approach that not only accelerates the training process but also reduces computational requirements. We validate our theoretical framework, which guides the optimal use of Deep Fusion, showing that it significantly reduces both training time and resource consumption.
arXiv Detail & Related papers (2023-06-20T21:30:54Z)
Example Forgetting: A Novel Approach to Explain and Interpret Deep Neural Networks in Seismic Interpretation [12.653673008542155]
deep neural networks are an attractive component for the common interpretation pipeline. Deep neural networks are frequently met with distrust due to their property of producing semantically incorrect outputs when exposed to sections the model was not trained on. We introduce a method that effectively relates semantically malfunctioned predictions to their respectful positions within the neural network representation manifold.
arXiv Detail & Related papers (2023-02-24T19:19:22Z)
Training Deep Neural Networks with Joint Quantization and Pruning of Weights and Activations [5.17729871332369]
State-of-the-art quantization techniques are currently applied to both the weights and activations of deep neural networks. In this work, we jointly apply novel uniform quantization and unstructured pruning methods to both the weights and activations of deep neural networks during training.
arXiv Detail & Related papers (2021-10-15T16:14:36Z)
Explaining Deep Learning Representations by Tracing the Training Process [10.774699463547439]
We propose a novel explanation method that explains the decisions of a deep neural network. We investigate how the intermediate representations at each layer of the deep network were refined during the training process. We show that our method identifies highly representative training instances that can be used as an explanation.
arXiv Detail & Related papers (2021-09-13T11:29:04Z)
What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks. Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z)
Dynamic Sparse Training for Deep Reinforcement Learning [36.66889208433228]
We propose for the first time to dynamically train deep reinforcement learning agents with sparse neural networks from scratch. Our approach is easy to be integrated into existing deep reinforcement learning algorithms. We evaluate our approach on OpenAI gym continuous control tasks.
arXiv Detail & Related papers (2021-06-08T09:57:20Z)
Local Critic Training for Model-Parallel Learning of Deep Neural Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training. We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z)
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks [78.47459801017959]
Sparsity can reduce the memory footprint of regular networks to fit mobile devices. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice.
arXiv Detail & Related papers (2021-01-31T22:48:50Z)
Subset Sampling For Progressive Neural Network Learning [106.12874293597754]
Progressive Neural Network Learning is a class of algorithms that incrementally construct the network's topology and optimize its parameters based on the training data. We propose to speed up this process by exploiting subsets of training data at each incremental training step. Experimental results in object, scene and face recognition problems demonstrate that the proposed approach speeds up the optimization procedure considerably.
arXiv Detail & Related papers (2020-02-17T18:57:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.