Related papers: Activated Gradients for Deep Neural Networks

Activated Gradients for Deep Neural Networks

URL: http://arxiv.org/abs/2107.04228v1
Date: Fri, 9 Jul 2021 06:00:55 GMT
Title: Activated Gradients for Deep Neural Networks
Authors: Mei Liu, Liangming Chen, Xiaohao Du, Long Jin, and Mingsheng Shang
Abstract summary: Deep neural networks often suffer from poor performance or even training failure due to the ill-conditioned problem. In this paper, a novel method by acting the gradient activation function (GAF) on the gradient is proposed to handle these challenges.
Score: 9.476778519758426
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks often suffer from poor performance or even training failure due to the ill-conditioned problem, the vanishing/exploding gradient problem, and the saddle point problem. In this paper, a novel method by acting the gradient activation function (GAF) on the gradient is proposed to handle these challenges. Intuitively, the GAF enlarges the tiny gradients and restricts the large gradient. Theoretically, this paper gives conditions that the GAF needs to meet, and on this basis, proves that the GAF alleviates the problems mentioned above. In addition, this paper proves that the convergence rate of SGD with the GAF is faster than that without the GAF under some assumptions. Furthermore, experiments on CIFAR, ImageNet, and PASCAL visual object classes confirm the GAF's effectiveness. The experimental results also demonstrate that the proposed method is able to be adopted in various deep neural networks to improve their performance. The source code is publicly available at https://github.com/LongJin-lab/Activated-Gradients-for-Deep-Neural-Networks.

Related papers

On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning [15.409865070022951]
Graph Neural Networks (GNNs) are models that leverage the graph structure to transmit information between nodes. We show that a simple state-space formulation of a GNN effectively alleviates over-smoothing and over-squashing at no extra trainable parameter cost.
arXiv Detail & Related papers (2025-02-15T14:43:41Z)
Rethinking PGD Attack: Is Sign Function Necessary? [131.6894310945647]
We present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance. We propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign. The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments.
arXiv Detail & Related papers (2023-12-03T02:26:58Z)
Bregman Graph Neural Network [27.64062763929748]
In node classification tasks, the smoothing effect induced by GNNs tends to assimilate representations and over-homogenize labels of connected nodes. We propose a novel bilevel optimization framework for GNNs inspired by the notion of Bregman distance.
arXiv Detail & Related papers (2023-09-12T23:54:24Z)
Can Unstructured Pruning Reduce the Depth in Deep Neural Networks? [5.869633234882029]
Pruning is a widely used technique for reducing the size of deep neural networks while maintaining their performance. In this study, we introduce EGP, an innovative Entropy Guided Pruning algorithm aimed at reducing the size of deep neural networks while preserving their performance.
arXiv Detail & Related papers (2023-08-12T17:27:49Z)
Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again [96.4999517230259]
We provide a new perspective of gradient flow to understand the substandard performance of deep GCNs. We propose to use gradient-guided dynamic rewiring of vanilla-GCNs with skip connections. Our methods significantly boost their performance to comfortably compete and outperform many fancy state-of-the-art methods.
arXiv Detail & Related papers (2022-10-14T21:30:25Z)
Gradient Gating for Deep Multi-Rate Learning on Graphs [62.25886489571097]
We present Gradient Gating (G$2$), a novel framework for improving the performance of Graph Neural Networks (GNNs) Our framework is based on gating the output of GNN layers with a mechanism for multi-rate flow of message passing information across nodes of the underlying graph.
arXiv Detail & Related papers (2022-10-02T13:19:48Z)
TSG: Target-Selective Gradient Backprop for Probing CNN Visual Saliency [72.9106103283475]
We study the visual saliency, a.k.a. visual explanation, to interpret convolutional neural networks. Inspired by those observations, we propose a novel visual saliency framework, termed Target-Selective Gradient (TSG) backprop. The proposed TSG consists of two components, namely, TSG-Conv and TSG-FC, which rectify the gradients for convolutional layers and fully-connected layers, respectively.
arXiv Detail & Related papers (2021-10-11T12:00:20Z)
Graph Neural Networks with Adaptive Frequency Response Filter [55.626174910206046]
We develop a graph neural network framework AdaGNN with a well-smooth adaptive frequency response filter. We empirically validate the effectiveness of the proposed framework on various benchmark datasets.
arXiv Detail & Related papers (2021-04-26T19:31:21Z)
Tackling Over-Smoothing for General Graph Convolutional Networks [88.71154017107257]
We study how general GCNs act with the increase in depth, including generic GCN, GCN with bias, ResGCN, and APPNP. We propose DropEdge to alleviate over-smoothing by randomly removing a certain number of edges at each training epoch.
arXiv Detail & Related papers (2020-08-22T16:14:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.