Related papers: The learning phases in NN: From Fitting the Majority to Fitting a Few

The learning phases in NN: From Fitting the Majority to Fitting a Few

URL: http://arxiv.org/abs/2202.08299v1
Date: Wed, 16 Feb 2022 19:11:42 GMT
Title: The learning phases in NN: From Fitting the Majority to Fitting a Few
Authors: Johannes Schneider
Abstract summary: We analyze a layer's reconstruction ability of the input and prediction performance based on the evolution of parameters during training. We also assess the behavior using common datasets and architectures from computer vision such as ResNet and VGG.
Score: 2.5991265608180396
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The learning dynamics of deep neural networks are subject to controversy. Using the information bottleneck (IB) theory separate fitting and compression phases have been put forward but have since been heavily debated. We approach learning dynamics by analyzing a layer's reconstruction ability of the input and prediction performance based on the evolution of parameters during training. We show that a prototyping phase decreasing reconstruction loss initially, followed by reducing classification loss of a few samples, which increases reconstruction loss, exists under mild assumptions on the data. Aside from providing a mathematical analysis of single layer classification networks, we also assess the behavior using common datasets and architectures from computer vision such as ResNet and VGG.

Related papers

Generalized Factor Neural Network Model for High-dimensional Regression [50.554377879576066]
We tackle the challenges of modeling high-dimensional data sets with latent low-dimensional structures hidden within complex, non-linear, and noisy relationships. Our approach enables a seamless integration of concepts from non-parametric regression, factor models, and neural networks for high-dimensional regression.
arXiv Detail & Related papers (2025-02-16T23:13:55Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
Understanding and Leveraging the Learning Phases of Neural Networks [7.1169582271841625]
The learning dynamics of deep neural networks are not well understood. We comprehensively analyze the learning dynamics by investigating a layer's reconstruction ability of the input and prediction performance. We show the existence of three phases using common datasets and architectures such as ResNet and VGG.
arXiv Detail & Related papers (2023-12-11T23:20:58Z)
Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses [28.203535970330343]
Haim et al. (2022) proposed a scheme to reconstruct training samples from multilayer perceptron binary classifiers. We extend their findings in several directions, including reconstruction from multiclass and convolutional neural networks. We study the various factors that contribute to networks' susceptibility to such reconstruction schemes.
arXiv Detail & Related papers (2023-07-04T17:09:49Z)
Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training. We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z)
Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers. It is an instance of a key structural condition that applies across broad domains of machine learning. For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z)
With Greater Distance Comes Worse Performance: On the Perspective of Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning. Early layers generally learn representations relevant to performance on both training data and testing data. Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z)
A Dataset-Dispersion Perspective on Reconstruction Versus Recognition in Single-View 3D Reconstruction Networks [16.348294592961327]
We introduce the dispersion score, a new data-driven metric, to quantify this leading factor and study its effect on NNs. We show that the proposed metric is a principal way to analyze reconstruction quality and provides novel information in addition to the conventional reconstruction score.
arXiv Detail & Related papers (2021-11-30T06:33:35Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)
Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow. We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts. We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time. We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.