RSO: A Gradient Free Sampling Based Approach For Training Deep Neural
Networks
- URL: http://arxiv.org/abs/2005.05955v1
- Date: Tue, 12 May 2020 17:55:16 GMT
- Title: RSO: A Gradient Free Sampling Based Approach For Training Deep Neural
Networks
- Authors: Rohun Tripathi and Bharat Singh
- Abstract summary: RSO is a gradient free Markov Chain Monte Carlo search based approach for training deep neural networks.
RSO is evaluated on classification tasks on MNIST and CIFAR-10 datasets with deep neural networks of 6 to 10 layers.
- Score: 10.292439652458153
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose RSO (random search optimization), a gradient free Markov Chain
Monte Carlo search based approach for training deep neural networks. To this
end, RSO adds a perturbation to a weight in a deep neural network and tests if
it reduces the loss on a mini-batch. If this reduces the loss, the weight is
updated, otherwise the existing weight is retained. Surprisingly, we find that
repeating this process a few times for each weight is sufficient to train a
deep neural network. The number of weight updates for RSO is an order of
magnitude lesser when compared to backpropagation with SGD. RSO can make
aggressive weight updates in each step as there is no concept of learning rate.
The weight update step for individual layers is also not coupled with the
magnitude of the loss. RSO is evaluated on classification tasks on MNIST and
CIFAR-10 datasets with deep neural networks of 6 to 10 layers where it achieves
an accuracy of 99.1% and 81.8% respectively. We also find that after updating
the weights just 5 times, the algorithm obtains a classification accuracy of
98% on MNIST.
Related papers
- Weights Augmentation: it has never ever ever ever let her model down [1.5020330976600735]
This article proposes the concept of weight augmentation, focusing on weight exploration.
Weight Augmentation Strategy (WAS) is to adopt random transformed weight coefficients training and transformed, named Shadow Weight(SW), for networks that can be used to calculate loss function.
Our experimental results show that convolutional neural networks, such as VGG-16, ResNet-18, ResNet-34, GoogleNet, MobilementV2, and Efficientment-Lite, can benefit much at little or no cost.
arXiv Detail & Related papers (2024-05-30T00:57:06Z) - Improved Generalization of Weight Space Networks via Augmentations [56.571475005291035]
Learning in deep weight spaces (DWS) is an emerging research direction, with applications to 2D and 3D neural fields (INRs, NeRFs)
We empirically analyze the reasons for this overfitting and find that a key reason is the lack of diversity in DWS datasets.
To address this, we explore strategies for data augmentation in weight spaces and propose a MixUp method adapted for weight spaces.
arXiv Detail & Related papers (2024-02-06T15:34:44Z) - Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - Weight Compander: A Simple Weight Reparameterization for Regularization [5.744133015573047]
We introduce weight compander, a novel effective method to improve generalization of deep neural networks.
We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.
arXiv Detail & Related papers (2023-06-29T14:52:04Z) - InRank: Incremental Low-Rank Learning [85.6380047359139]
gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training.
Existing training algorithms do not exploit the low-rank property to improve computational efficiency.
We design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices.
arXiv Detail & Related papers (2023-06-20T03:03:04Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - WeightMom: Learning Sparse Networks using Iterative Momentum-based
pruning [0.0]
We propose a weight based pruning approach in which the weights are pruned gradually based on their momentum of the previous iterations.
We evaluate our approach on networks such as AlexNet, VGG16 and ResNet50 with image classification datasets such as CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2022-08-11T07:13:59Z) - Low-Precision Training in Logarithmic Number System using Multiplicative
Weight Update [49.948082497688404]
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts.
One promising approach to reduce the energy costs is representing DNNs with low-precision numbers.
We jointly design a lowprecision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam.
arXiv Detail & Related papers (2021-06-26T00:32:17Z) - Neural networks with late-phase weights [66.72777753269658]
We show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning.
At the end of learning, we obtain back a single model by taking a spatial average in weight space.
arXiv Detail & Related papers (2020-07-25T13:23:37Z) - RIFLE: Backpropagation in Depth for Deep Transfer Learning through
Re-Initializing the Fully-connected LayEr [60.07531696857743]
Fine-tuning the deep convolution neural network(CNN) using a pre-trained model helps transfer knowledge learned from larger datasets to the target task.
We propose RIFLE - a strategy that deepens backpropagation in transfer learning settings.
RIFLE brings meaningful updates to the weights of deep CNN layers and improves low-level feature learning.
arXiv Detail & Related papers (2020-07-07T11:27:43Z) - Training highly effective connectivities within neural networks with
randomly initialized, fixed weights [4.56877715768796]
We introduce a novel way of training a network by flipping the signs of the weights.
We obtain good results even with weights constant magnitude or even when weights are drawn from highly asymmetric distributions.
arXiv Detail & Related papers (2020-06-30T09:41:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.