Gradient Centralization: A New Optimization Technique for Deep Neural
Networks
- URL: http://arxiv.org/abs/2004.01461v2
- Date: Wed, 8 Apr 2020 03:40:44 GMT
- Title: Gradient Centralization: A New Optimization Technique for Deep Neural
Networks
- Authors: Hongwei Yong, Jianqiang Huang, Xiansheng Hua and Lei Zhang
- Abstract summary: gradient centralization (GC) operates directly on gradients by centralizing the gradient vectors to have zero mean.
GC can be viewed as a projected gradient descent method with a constrained loss function.
GC is very simple to implement and can be easily embedded into existing gradient based DNNs with only one line of code.
- Score: 74.935141515523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimization techniques are of great importance to effectively and
efficiently train a deep neural network (DNN). It has been shown that using the
first and second order statistics (e.g., mean and variance) to perform Z-score
standardization on network activations or weight vectors, such as batch
normalization (BN) and weight standardization (WS), can improve the training
performance. Different from these existing methods that mostly operate on
activations or weights, we present a new optimization technique, namely
gradient centralization (GC), which operates directly on gradients by
centralizing the gradient vectors to have zero mean. GC can be viewed as a
projected gradient descent method with a constrained loss function. We show
that GC can regularize both the weight space and output feature space so that
it can boost the generalization performance of DNNs. Moreover, GC improves the
Lipschitzness of the loss function and its gradient so that the training
process becomes more efficient and stable. GC is very simple to implement and
can be easily embedded into existing gradient based DNN optimizers with only
one line of code. It can also be directly used to fine-tune the pre-trained
DNNs. Our experiments on various applications, including general image
classification, fine-grained image classification, detection and segmentation,
demonstrate that GC can consistently improve the performance of DNN learning.
The code of GC can be found at
https://github.com/Yonghongwei/Gradient-Centralization.
Related papers
- Adaptive Gradient Regularization: A Faster and Generalizable Optimization Technique for Deep Neural Networks [5.507301894089302]
This paper is the first attempt to study a new optimization technique for deep neural networks, using the sum normalization of a gradient vector as coefficients.
The proposed technique is hence named as the adaptive gradient regularization (AGR)
arXiv Detail & Related papers (2024-07-24T02:23:18Z) - On the Initialization of Graph Neural Networks [10.153841274798829]
We analyze the variance of forward and backward propagation across Graph Neural Networks layers.
We propose a new method for Variance Instability Reduction within GNN Optimization (Virgo)
We conduct comprehensive experiments on 15 datasets to show that Virgo can lead to superior model performance.
arXiv Detail & Related papers (2023-12-05T09:55:49Z) - Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining.
Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z) - Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - Penalizing Gradient Norm for Efficiently Improving Generalization in
Deep Learning [13.937644559223548]
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning.
We propose an effective method to improve the model generalization by penalizing the gradient norm of loss function during optimization.
arXiv Detail & Related papers (2022-02-08T02:03:45Z) - Generalizable Cross-Graph Embedding for GNN-based Congestion Prediction [22.974348682859322]
We propose a framework that can directly learn embeddings for the given netlist to enhance the quality of our node features.
By combining the learned embedding on top of the netlist with the GNNs, our method improves prediction performance, generalizes to new circuit lines, and is efficient in training, potentially saving over $90 %$ of runtime.
arXiv Detail & Related papers (2021-11-10T20:56:29Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.