Weighted Aggregating Stochastic Gradient Descent for Parallel Deep
Learning
- URL: http://arxiv.org/abs/2004.03749v1
- Date: Tue, 7 Apr 2020 23:38:29 GMT
- Title: Weighted Aggregating Stochastic Gradient Descent for Parallel Deep
Learning
- Authors: Pengzhan Guo, Zeyang Ye, Keli Xiao, Wei Zhu
- Abstract summary: Solution involves a reformation of the objective function for optimization in neural network models.
We introduce a decentralized weighted aggregating scheme based on the performance of local workers.
To validate the new method, we benchmark our schemes against several popular algorithms.
- Score: 8.366415386275557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the stochastic optimization problem with a focus on
developing scalable parallel algorithms for deep learning tasks. Our solution
involves a reformation of the objective function for stochastic optimization in
neural network models, along with a novel parallel strategy, coined weighted
aggregating stochastic gradient descent (WASGD). Following a theoretical
analysis on the characteristics of the new objective function, WASGD introduces
a decentralized weighted aggregating scheme based on the performance of local
workers. Without any center variable, the new method automatically assesses the
importance of local workers and accepts them according to their contributions.
Furthermore, we have developed an enhanced version of the method, WASGD+, by
(1) considering a designed sample order and (2) applying a more advanced weight
evaluating function. To validate the new method, we benchmark our schemes
against several popular algorithms including the state-of-the-art techniques
(e.g., elastic averaging SGD) in training deep neural networks for
classification tasks. Comprehensive experiments have been conducted on four
classic datasets, including the CIFAR-100, CIFAR-10, Fashion-MNIST, and MNIST.
The subsequent results suggest the superiority of the WASGD scheme in
accelerating the training of deep architecture. Better still, the enhanced
version, WASGD+, has been shown to be a significant improvement over its basic
version.
Related papers
- Adaptive Anomaly Detection in Network Flows with Low-Rank Tensor Decompositions and Deep Unrolling [9.20186865054847]
Anomaly detection (AD) is increasingly recognized as a key component for ensuring the resilience of future communication systems.
This work considers AD in network flows using incomplete measurements.
We propose a novel block-successive convex approximation algorithm based on a regularized model-fitting objective.
Inspired by Bayesian approaches, we extend the model architecture to perform online adaptation to per-flow and per-time-step statistics.
arXiv Detail & Related papers (2024-09-17T19:59:57Z) - GRAWA: Gradient-based Weighted Averaging for Distributed Training of
Deep Learning Models [9.377424534371727]
We study distributed training of deep models in time-constrained environments.
We propose a new algorithm that periodically pulls workers towards the center variable computed as an average of workers.
arXiv Detail & Related papers (2024-03-07T04:22:34Z) - Optimal feature rescaling in machine learning based on neural networks [0.0]
An optimal rescaling of input features (OFR) is carried out by a Genetic Algorithm (GA)
The OFR reshapes the input space improving the conditioning of the gradient-based algorithm used for the training.
The approach has been tested on a FFNN modeling the outcome of a real industrial process.
arXiv Detail & Related papers (2024-02-13T21:57:31Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - Federated Learning Aggregation: New Robust Algorithms with Guarantees [63.96013144017572]
Federated learning has been recently proposed for distributed model training at the edge.
This paper presents a complete general mathematical convergence analysis to evaluate aggregation strategies in a federated learning framework.
We derive novel aggregation algorithms which are able to modify their model architecture by differentiating client contributions according to the value of their losses.
arXiv Detail & Related papers (2022-05-22T16:37:53Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Gradient Monitored Reinforcement Learning [0.0]
We focus on the enhancement of training and evaluation performance in reinforcement learning algorithms.
We propose an approach to steer the learning in the weight parameters of a neural network based on the dynamic development and feedback from the training process itself.
arXiv Detail & Related papers (2020-05-25T13:45:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.