Scaling Private Deep Learning with Low-Rank and Sparse Gradients
- URL: http://arxiv.org/abs/2207.02699v1
- Date: Wed, 6 Jul 2022 14:09:47 GMT
- Title: Scaling Private Deep Learning with Low-Rank and Sparse Gradients
- Authors: Ryuichi Ito, Seng Pei Liew, Tsubasa Takahashi, Yuya Sasaki, Makoto
Onizuka
- Abstract summary: We propose a framework that exploits the low-rank and sparse structure of neural networks to reduce the dimension of gradient updates.
A novel strategy is utilized to sparsify the gradients, resulting in low-dimensional, less noisy updates.
Empirical evaluation on natural language processing and computer vision tasks shows that our method outperforms other state-of-the-art baselines.
- Score: 5.14780936727027
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Applying Differentially Private Stochastic Gradient Descent (DPSGD) to
training modern, large-scale neural networks such as transformer-based models
is a challenging task, as the magnitude of noise added to the gradients at each
iteration scales with model dimension, hindering the learning capability
significantly. We propose a unified framework, $\textsf{LSG}$, that fully
exploits the low-rank and sparse structure of neural networks to reduce the
dimension of gradient updates, and hence alleviate the negative impacts of
DPSGD. The gradient updates are first approximated with a pair of low-rank
matrices. Then, a novel strategy is utilized to sparsify the gradients,
resulting in low-dimensional, less noisy updates that are yet capable of
retaining the performance of neural networks. Empirical evaluation on natural
language processing and computer vision tasks shows that our method outperforms
other state-of-the-art baselines.
Related papers
- Gradient Rewiring for Editable Graph Neural Network Training [84.77778876113099]
underlineGradient underlineRewiring method for underlineEditable graph neural network training, named textbfGRE.
We propose a simple yet effective underlineGradient underlineRewiring method for underlineEditable graph neural network training, named textbfGRE.
arXiv Detail & Related papers (2024-10-21T01:01:50Z) - Occam Gradient Descent [0.0]
Occam Gradient Descent is an algorithm that reduces model size to minimize generalization error, and gradient descent on model weights to minimize fitting error.
Our algorithm simultaneously descends the space of weights and topological size of any neural network without modification.
arXiv Detail & Related papers (2024-05-30T15:58:22Z) - Take A Shortcut Back: Mitigating the Gradient Vanishing for Training Spiking Neural Networks [15.691263438655842]
Spiking Neural Network (SNN) is a biologically inspired neural network infrastructure that has recently garnered significant attention.
Training an SNN directly poses a challenge due to the undefined gradient of the firing spike process.
We propose a shortcut back-propagation method in our paper, which advocates for transmitting the gradient directly from the loss to the shallow layers.
arXiv Detail & Related papers (2024-01-09T10:54:41Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Improving Deep Learning Interpretability by Saliency Guided Training [36.782919916001624]
Saliency methods have been widely used to highlight important input features in model predictions.
Most existing methods use backpropagation on a modified gradient function to generate saliency maps.
We introduce a saliency guided training procedure for neural networks to reduce noisy gradients used in predictions.
arXiv Detail & Related papers (2021-11-29T06:05:23Z) - Inertial Proximal Deep Learning Alternating Minimization for Efficient
Neutral Network Training [16.165369437324266]
This work develops an improved DLAM by the well-known inertial technique, namely iPDLAM, which predicts a point by linearization of current and last iterates.
Numerical results on real-world datasets are reported to demonstrate the efficiency of our proposed algorithm.
arXiv Detail & Related papers (2021-01-30T16:40:08Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.