Related papers: AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks

AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks

URL: http://arxiv.org/abs/2003.03633v1
Date: Sat, 7 Mar 2020 18:38:46 GMT
Title: AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks
Authors: Majed El Helou, Frederike D\"umbgen, Sabine S\"usstrunk
Abstract summary: We propose a novel regularization method that progressively penalizes the magnitude of activations during training. Our method's effect on generalization is analyzed with label randomization tests and cumulative ablations.
Score: 12.14537824884951
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The large capacity of neural networks enables them to learn complex functions. To avoid overfitting, networks however require a lot of training data that can be expensive and time-consuming to collect. A common practical approach to attenuate overfitting is the use of network regularization techniques. We propose a novel regularization method that progressively penalizes the magnitude of activations during training. The combined activation signals produced by all neurons in a given layer form the representation of the input image in that feature space. We propose to regularize this representation in the last feature layer before classification layers. Our method's effect on generalization is analyzed with label randomization tests and cumulative ablations. Experimental results show the advantages of our approach in comparison with commonly-used regularizers on standard benchmark datasets.

Related papers

How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree Spectral Bias of Neural Networks [79.28094304325116]
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions. We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets. We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
arXiv Detail & Related papers (2023-05-16T20:06:01Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Theoretical Characterization of How Neural Network Pruning Affects its Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero. More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
On the optimization and generalization of overparameterized implicit neural networks [25.237054775800164]
Implicit neural networks have become increasingly attractive in the machine learning community. We show that global convergence is guaranteed, even if only the implicit layer is trained. This paper investigates the generalization error for implicit neural networks.
arXiv Detail & Related papers (2022-09-30T16:19:46Z)
With Greater Distance Comes Worse Performance: On the Perspective of Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning. Early layers generally learn representations relevant to performance on both training data and testing data. Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z)
Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies. We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z)
Compressive Sensing and Neural Networks from a Statistical Learning Perspective [4.561032960211816]
We present a generalization error analysis for a class of neural networks suitable for sparse reconstruction from few linear measurements. Under realistic conditions, the generalization error scales only logarithmically in the number of layers, and at most linear in number of measurements.
arXiv Detail & Related papers (2020-10-29T15:05:43Z)
Exploiting the Full Capacity of Deep Neural Networks while Avoiding Overfitting by Targeted Sparsity Regularization [1.3764085113103217]
Overfitting is one of the most common problems when training deep neural networks on comparatively small datasets. We propose novel targeted sparsity visualization and regularization strategies to counteract overfitting.
arXiv Detail & Related papers (2020-02-21T11:38:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.