Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks
- URL: http://arxiv.org/abs/2405.15481v1
- Date: Fri, 24 May 2024 11:59:41 GMT
- Title: Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks
- Authors: Jialin Zhao, Yingtao Zhang, Xinghang Li, Huaping Liu, Carlo Vittorio Cannistraci,
- Abstract summary: We propose Sparse Spectral Training (SST), an advanced training methodology that updates all singular values and selectively updates singular vectors of network weights.
SST refines the training process by employing a targeted updating strategy for singular vectors, which is determined by a multinomial sampling method weighted by the significance of the singular values.
On OPT-125M, with rank equating to 8.3% of embedding dimension, SST reduces the perplexity gap to full-rank training by 67.6%, demonstrating a significant reduction of the performance loss with prevalent low-rank methods.
- Score: 9.96381061452642
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growing computational demands posed by increasingly number of neural network's parameters necessitate low-memory-consumption training approaches. Previous memory reduction techniques, such as Low-Rank Adaptation (LoRA) and ReLoRA, suffer from the limitation of low rank and saddle point issues, particularly during intensive tasks like pre-training. In this paper, we propose Sparse Spectral Training (SST), an advanced training methodology that updates all singular values and selectively updates singular vectors of network weights, thereby optimizing resource usage while closely approximating full-rank training. SST refines the training process by employing a targeted updating strategy for singular vectors, which is determined by a multinomial sampling method weighted by the significance of the singular values, ensuring both high performance and memory reduction. Through comprehensive testing on both Euclidean and hyperbolic neural networks across various tasks, including natural language generation, machine translation, node classification and link prediction, SST demonstrates its capability to outperform existing memory reduction training methods and is comparable with full-rank training in some cases. On OPT-125M, with rank equating to 8.3% of embedding dimension, SST reduces the perplexity gap to full-rank training by 67.6%, demonstrating a significant reduction of the performance loss with prevalent low-rank methods. This approach offers a strong alternative to traditional training techniques, paving the way for more efficient and scalable neural network training solutions.
Related papers
- Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training [30.452060061499523]
We introduce an approximation technique for the likelihood ratio (LR) method to alleviate computational and memory demands in gradient estimation.
Experiments demonstrate the effectiveness of the approximation technique in neural network training.
arXiv Detail & Related papers (2024-03-18T23:23:50Z) - Towards Memory- and Time-Efficient Backpropagation for Training Spiking
Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing.
We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency.
Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Learning in Feedback-driven Recurrent Spiking Neural Networks using
full-FORCE Training [4.124948554183487]
We propose a supervised training procedure for RSNNs, where a second network is introduced only during the training.
The proposed training procedure consists of generating targets for both recurrent and readout layers.
We demonstrate the improved performance and noise robustness of the proposed full-FORCE training procedure to model 8 dynamical systems.
arXiv Detail & Related papers (2022-05-26T19:01:19Z) - Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep
Neural Network, a Survey [69.3939291118954]
State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly.
Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass.
This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training.
arXiv Detail & Related papers (2022-05-17T05:37:08Z) - Dynamic Sparse Training for Deep Reinforcement Learning [36.66889208433228]
We propose for the first time to dynamically train deep reinforcement learning agents with sparse neural networks from scratch.
Our approach is easy to be integrated into existing deep reinforcement learning algorithms.
We evaluate our approach on OpenAI gym continuous control tasks.
arXiv Detail & Related papers (2021-06-08T09:57:20Z) - Selfish Sparse RNN Training [13.165729746380816]
We propose an approach to train sparse RNNs with a fixed parameter count in one single run, without compromising performance.
We achieve state-of-the-art sparse training results with various datasets on Penn TreeBank and Wikitext-2.
arXiv Detail & Related papers (2021-01-22T10:45:40Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z) - Subset Sampling For Progressive Neural Network Learning [106.12874293597754]
Progressive Neural Network Learning is a class of algorithms that incrementally construct the network's topology and optimize its parameters based on the training data.
We propose to speed up this process by exploiting subsets of training data at each incremental training step.
Experimental results in object, scene and face recognition problems demonstrate that the proposed approach speeds up the optimization procedure considerably.
arXiv Detail & Related papers (2020-02-17T18:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.