Exploiting the Full Capacity of Deep Neural Networks while Avoiding
Overfitting by Targeted Sparsity Regularization
- URL: http://arxiv.org/abs/2002.09237v1
- Date: Fri, 21 Feb 2020 11:38:17 GMT
- Title: Exploiting the Full Capacity of Deep Neural Networks while Avoiding
Overfitting by Targeted Sparsity Regularization
- Authors: Karim Huesmann, Soeren Klemm, Lars Linsen and Benjamin Risse
- Abstract summary: Overfitting is one of the most common problems when training deep neural networks on comparatively small datasets.
We propose novel targeted sparsity visualization and regularization strategies to counteract overfitting.
- Score: 1.3764085113103217
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Overfitting is one of the most common problems when training deep neural
networks on comparatively small datasets. Here, we demonstrate that neural
network activation sparsity is a reliable indicator for overfitting which we
utilize to propose novel targeted sparsity visualization and regularization
strategies. Based on these strategies we are able to understand and counteract
overfitting caused by activation sparsity and filter correlation in a targeted
layer-by-layer manner. Our results demonstrate that targeted sparsity
regularization can efficiently be used to regularize well-known datasets and
architectures with a significant increase in image classification performance
while outperforming both dropout and batch normalization. Ultimately, our study
reveals novel insights into the contradicting concepts of activation sparsity
and network capacity by demonstrating that targeted sparsity regularization
enables salient and discriminative feature learning while exploiting the full
capacity of deep models without suffering from overfitting, even when trained
excessively.
Related papers
- Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature.
We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate.
We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z) - Towards Improving Robustness Against Common Corruptions using Mixture of
Class Specific Experts [10.27974860479791]
This paper introduces a novel paradigm known as the Mixture of Class-Specific Expert Architecture.
The proposed architecture aims to mitigate vulnerabilities associated with common neural network structures.
arXiv Detail & Related papers (2023-11-16T20:09:47Z) - Sparsity-aware generalization theory for deep neural networks [12.525959293825318]
We present a new approach to analyzing generalization for deep feed-forward ReLU networks.
We show fundamental trade-offs between sparsity and generalization.
arXiv Detail & Related papers (2023-07-01T20:59:05Z) - Clustering-Based Interpretation of Deep ReLU Network [17.234442722611803]
We recognize that the non-linear behavior of the ReLU function gives rise to a natural clustering.
We propose a method to increase the level of interpretability of a fully connected feedforward ReLU neural network.
arXiv Detail & Related papers (2021-10-13T09:24:11Z) - Residual Error: a New Performance Measure for Adversarial Robustness [85.0371352689919]
A major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks.
This study presents the concept of residual error, a new performance measure for assessing the adversarial robustness of a deep neural network.
Experimental results using the case of image classification demonstrate the effectiveness and efficacy of the proposed residual error metric.
arXiv Detail & Related papers (2021-06-18T16:34:23Z) - The Impact of Activation Sparsity on Overfitting in Convolutional Neural
Networks [1.9424280683610138]
Overfitting is one of the fundamental challenges when training convolutional neural networks.
In this study we introduce a perplexity-based sparsity definition to derive and visualise layer-wise activation measures.
arXiv Detail & Related papers (2021-04-13T12:55:37Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z) - Sparsity in Deep Learning: Pruning and growth for efficient inference
and training in neural networks [78.47459801017959]
Sparsity can reduce the memory footprint of regular networks to fit mobile devices.
We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice.
arXiv Detail & Related papers (2021-01-31T22:48:50Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Untangling tradeoffs between recurrence and self-attention in neural
networks [81.30894993852813]
We present a formal analysis of how self-attention affects gradient propagation in recurrent networks.
We prove that it mitigates the problem of vanishing gradients when trying to capture long-term dependencies.
We propose a relevancy screening mechanism that allows for a scalable use of sparse self-attention with recurrence.
arXiv Detail & Related papers (2020-06-16T19:24:25Z) - AL2: Progressive Activation Loss for Learning General Representations in
Classification Neural Networks [12.14537824884951]
We propose a novel regularization method that progressively penalizes the magnitude of activations during training.
Our method's effect on generalization is analyzed with label randomization tests and cumulative ablations.
arXiv Detail & Related papers (2020-03-07T18:38:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.