Related papers: POGD: Gradient Descent with New Stochastic Rules

POGD: Gradient Descent with New Stochastic Rules

URL: http://arxiv.org/abs/2210.10654v1
Date: Sat, 15 Oct 2022 12:31:02 GMT
Title: POGD: Gradient Descent with New Stochastic Rules
Authors: Feihu Han, Sida Xing, Sui Yang Khoo
Abstract summary: The experiments in this paper mainly focus on the training speed to reach the target value and the ability to prevent the local minimum. The experiments in this paper are achieved by the convolutional neural network (CNN) image classification on the MNIST and cifar-10 datasets.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There introduce Particle Optimized Gradient Descent (POGD), an algorithm based on the gradient descent but integrates the particle swarm optimization (PSO) principle to achieve the iteration. From the experiments, this algorithm has adaptive learning ability. The experiments in this paper mainly focus on the training speed to reach the target value and the ability to prevent the local minimum. The experiments in this paper are achieved by the convolutional neural network (CNN) image classification on the MNIST and cifar-10 datasets.

Related papers

A Numerical Gradient Inversion Attack in Variational Quantum Neural-Networks [4.086403209504347]
The loss landscape of Variational Quantum Neural Networks (VQNNs) is characterized by local minima that grow exponentially with increasing qubits. We present a numerical scheme that successfully reconstructs input training, real-world, practical data from trainable VQNNs' gradients.
arXiv Detail & Related papers (2025-04-17T10:12:38Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks. We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights. Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z)
Orthogonalising gradients to speed up neural network optimisation [0.0]
optimisation of neural networks can be sped up by orthogonalising the gradients before the optimisation step, ensuring the diversification of the learned representations. We tested this method on ImageNet and CIFAR-10 resulting in a large decrease in learning time, and also obtain a speed-up on the semi-supervised learning BarlowTwins.
arXiv Detail & Related papers (2022-02-14T21:46:07Z)
Inertial Proximal Deep Learning Alternating Minimization for Efficient Neutral Network Training [16.165369437324266]
This work develops an improved DLAM by the well-known inertial technique, namely iPDLAM, which predicts a point by linearization of current and last iterates. Numerical results on real-world datasets are reported to demonstrate the efficiency of our proposed algorithm.
arXiv Detail & Related papers (2021-01-30T16:40:08Z)
Gradient Centralization: A New Optimization Technique for Deep Neural Networks [74.935141515523]
gradient centralization (GC) operates directly on gradients by centralizing the gradient vectors to have zero mean. GC can be viewed as a projected gradient descent method with a constrained loss function. GC is very simple to implement and can be easily embedded into existing gradient based DNNs with only one line of code.
arXiv Detail & Related papers (2020-04-03T10:25:00Z)
Stochastic gradient descent with random learning rate [0.0]
We propose to optimize neural networks with a uniformly-distributed random learning rate. By comparing the random learning rate protocol with cyclic and constant protocols, we suggest that the random choice is generically the best strategy in the small learning rate regime. We provide supporting evidence through experiments on both shallow, fully-connected and deep, convolutional neural networks for image classification on the MNIST and CIFAR10 datasets.
arXiv Detail & Related papers (2020-03-15T21:36:46Z)
Improving the Backpropagation Algorithm with Consequentialism Weight Updates over Mini-Batches [0.40611352512781856]
We show that it is possible to consider a multi-layer neural network as a stack of adaptive filters. We introduce a better algorithm by predicting then emending the adverse consequences of the actions that take place in BP even before they happen. Our experiments show the usefulness of our algorithm in the training of deep neural networks.
arXiv Detail & Related papers (2020-03-11T08:45:36Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.