Population Gradients improve performance across data-sets and
architectures in object classification
- URL: http://arxiv.org/abs/2010.12260v1
- Date: Fri, 23 Oct 2020 09:40:23 GMT
- Title: Population Gradients improve performance across data-sets and
architectures in object classification
- Authors: Yurika Sakai, Andrey Kormilitzin, Qiang Liu, Alejo Nevado-Holgado
- Abstract summary: We present a new method to calculate the gradients while training Neural Networks (NNs)
It significantly improves final performance across architectures, data-sets, hyper- parameter values, training length, and model sizes.
Besides being effective in the wide array situations that we have tested, the increase in performance (e.g. F1) is as high or higher than this one of all the other widespread performance-improving methods.
- Score: 6.17047113475566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The most successful methods such as ReLU transfer functions, batch
normalization, Xavier initialization, dropout, learning rate decay, or dynamic
optimizers, have become standards in the field due, particularly, to their
ability to increase the performance of Neural Networks (NNs) significantly and
in almost all situations. Here we present a new method to calculate the
gradients while training NNs, and show that it significantly improves final
performance across architectures, data-sets, hyper-parameter values, training
length, and model sizes, including when it is being combined with other common
performance-improving methods (such as the ones mentioned above). Besides being
effective in the wide array situations that we have tested, the increase in
performance (e.g. F1) it provides is as high or higher than this one of all the
other widespread performance-improving methods that we have compared against.
We call our method Population Gradients (PG), and it consists on using a
population of NNs to calculate a non-local estimation of the gradient, which is
closer to the theoretical exact gradient (i.e. this one obtainable only with an
infinitely big data-set) of the error function than the empirical gradient
(i.e. this one obtained with the real finite data-set).
Related papers
- Haste Makes Waste: A Simple Approach for Scaling Graph Neural Networks [37.41604955004456]
Graph neural networks (GNNs) have demonstrated remarkable success in graph representation learning.
Various sampling approaches have been proposed to scale GNNs to applications with large-scale graphs.
arXiv Detail & Related papers (2024-10-07T18:29:02Z) - An Effective Dynamic Gradient Calibration Method for Continual Learning [11.555822066922508]
Continual learning (CL) is a fundamental topic in machine learning, where the goal is to train a model with continuously incoming data and tasks.
Due to the memory limit, we cannot store all the historical data, and therefore confront the catastrophic forgetting'' problem.
We develop an effective algorithm to calibrate the gradient in each updating step of the model.
arXiv Detail & Related papers (2024-07-30T16:30:09Z) - Efficient and Effective Implicit Dynamic Graph Neural Network [42.49148111696576]
We present Implicit Dynamic Graph Neural Network (IDGNN) a novel implicit neural network for dynamic graphs.
A key characteristic of IDGNN is that it demonstrably is well-posed, i.e., it is theoretically guaranteed to have a fixed-point representation.
arXiv Detail & Related papers (2024-06-25T19:07:21Z) - Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Penalizing Gradient Norm for Efficiently Improving Generalization in
Deep Learning [13.937644559223548]
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning.
We propose an effective method to improve the model generalization by penalizing the gradient norm of loss function during optimization.
arXiv Detail & Related papers (2022-02-08T02:03:45Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - Train Like a (Var)Pro: Efficient Training of Neural Networks with
Variable Projection [2.7561479348365734]
Deep neural networks (DNNs) have achieved state-of-theart performance across a variety of traditional machine learning tasks.
In this paper, we consider training of DNNs, which arises in many state-of-the-art applications.
arXiv Detail & Related papers (2020-07-26T16:29:39Z) - Efficient Learning of Generative Models via Finite-Difference Score
Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference.
Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.