Activated Gradients for Deep Neural Networks
- URL: http://arxiv.org/abs/2107.04228v1
- Date: Fri, 9 Jul 2021 06:00:55 GMT
- Title: Activated Gradients for Deep Neural Networks
- Authors: Mei Liu, Liangming Chen, Xiaohao Du, Long Jin, and Mingsheng Shang
- Abstract summary: Deep neural networks often suffer from poor performance or even training failure due to the ill-conditioned problem.
In this paper, a novel method by acting the gradient activation function (GAF) on the gradient is proposed to handle these challenges.
- Score: 9.476778519758426
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks often suffer from poor performance or even training
failure due to the ill-conditioned problem, the vanishing/exploding gradient
problem, and the saddle point problem. In this paper, a novel method by acting
the gradient activation function (GAF) on the gradient is proposed to handle
these challenges. Intuitively, the GAF enlarges the tiny gradients and
restricts the large gradient. Theoretically, this paper gives conditions that
the GAF needs to meet, and on this basis, proves that the GAF alleviates the
problems mentioned above. In addition, this paper proves that the convergence
rate of SGD with the GAF is faster than that without the GAF under some
assumptions. Furthermore, experiments on CIFAR, ImageNet, and PASCAL visual
object classes confirm the GAF's effectiveness. The experimental results also
demonstrate that the proposed method is able to be adopted in various deep
neural networks to improve their performance. The source code is publicly
available at
https://github.com/LongJin-lab/Activated-Gradients-for-Deep-Neural-Networks.
Related papers
- Rethinking PGD Attack: Is Sign Function Necessary? [131.6894310945647]
We present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance.
We propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign.
The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments.
arXiv Detail & Related papers (2023-12-03T02:26:58Z) - Bregman Graph Neural Network [27.64062763929748]
In node classification tasks, the smoothing effect induced by GNNs tends to assimilate representations and over-homogenize labels of connected nodes.
We propose a novel bilevel optimization framework for GNNs inspired by the notion of Bregman distance.
arXiv Detail & Related papers (2023-09-12T23:54:24Z) - Can Unstructured Pruning Reduce the Depth in Deep Neural Networks? [5.869633234882029]
Pruning is a widely used technique for reducing the size of deep neural networks while maintaining their performance.
In this study, we introduce EGP, an innovative Entropy Guided Pruning algorithm aimed at reducing the size of deep neural networks while preserving their performance.
arXiv Detail & Related papers (2023-08-12T17:27:49Z) - Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again [96.4999517230259]
We provide a new perspective of gradient flow to understand the substandard performance of deep GCNs.
We propose to use gradient-guided dynamic rewiring of vanilla-GCNs with skip connections.
Our methods significantly boost their performance to comfortably compete and outperform many fancy state-of-the-art methods.
arXiv Detail & Related papers (2022-10-14T21:30:25Z) - Gradient Gating for Deep Multi-Rate Learning on Graphs [62.25886489571097]
We present Gradient Gating (G$2$), a novel framework for improving the performance of Graph Neural Networks (GNNs)
Our framework is based on gating the output of GNN layers with a mechanism for multi-rate flow of message passing information across nodes of the underlying graph.
arXiv Detail & Related papers (2022-10-02T13:19:48Z) - TSG: Target-Selective Gradient Backprop for Probing CNN Visual Saliency [72.9106103283475]
We study the visual saliency, a.k.a. visual explanation, to interpret convolutional neural networks.
Inspired by those observations, we propose a novel visual saliency framework, termed Target-Selective Gradient (TSG) backprop.
The proposed TSG consists of two components, namely, TSG-Conv and TSG-FC, which rectify the gradients for convolutional layers and fully-connected layers, respectively.
arXiv Detail & Related papers (2021-10-11T12:00:20Z) - Graph Neural Networks with Adaptive Frequency Response Filter [55.626174910206046]
We develop a graph neural network framework AdaGNN with a well-smooth adaptive frequency response filter.
We empirically validate the effectiveness of the proposed framework on various benchmark datasets.
arXiv Detail & Related papers (2021-04-26T19:31:21Z) - Tackling Over-Smoothing for General Graph Convolutional Networks [88.71154017107257]
We study how general GCNs act with the increase in depth, including generic GCN, GCN with bias, ResGCN, and APPNP.
We propose DropEdge to alleviate over-smoothing by randomly removing a certain number of edges at each training epoch.
arXiv Detail & Related papers (2020-08-22T16:14:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.