Understanding and Leveraging the Learning Phases of Neural Networks
- URL: http://arxiv.org/abs/2312.06887v2
- Date: Thu, 14 Dec 2023 14:32:45 GMT
- Title: Understanding and Leveraging the Learning Phases of Neural Networks
- Authors: Johannes Schneider and Mohit Prabhushankar
- Abstract summary: The learning dynamics of deep neural networks are not well understood.
We comprehensively analyze the learning dynamics by investigating a layer's reconstruction ability of the input and prediction performance.
We show the existence of three phases using common datasets and architectures such as ResNet and VGG.
- Score: 7.1169582271841625
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The learning dynamics of deep neural networks are not well understood. The
information bottleneck (IB) theory proclaimed separate fitting and compression
phases. But they have since been heavily debated. We comprehensively analyze
the learning dynamics by investigating a layer's reconstruction ability of the
input and prediction performance based on the evolution of parameters during
training. We empirically show the existence of three phases using common
datasets and architectures such as ResNet and VGG: (i) near constant
reconstruction loss, (ii) decrease, and (iii) increase. We also derive an
empirically grounded data model and prove the existence of phases for
single-layer networks. Technically, our approach leverages classical complexity
analysis. It differs from IB by relying on measuring reconstruction loss rather
than information theoretic measures to relate information of intermediate
layers and inputs. Our work implies a new best practice for transfer learning:
We show empirically that the pre-training of a classifier should stop well
before its performance is optimal.
Related papers
- Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification [3.0398616939692777]
Techniques like adversarial learning, contrastive learning, diffusion denoising learning, and ordinary reconstruction learning have become standard.
The study aims to elucidate the advantages of pre-training techniques and fine-tuning strategies to enhance the learning process of neural networks.
arXiv Detail & Related papers (2024-05-29T15:44:51Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Layerwise Sparsifying Training and Sequential Learning Strategy for
Neural Architecture Adaptation [0.0]
This work presents a two-stage framework for developing neural architectures to adapt/ generalize well on a given training data set.
In the first stage, a manifold-regularized layerwise sparsifying training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers.
In the second stage, a sequential learning process is adopted where a sequence of small networks is employed to extract information from the residual produced in stage I.
arXiv Detail & Related papers (2022-11-13T09:51:16Z) - Critical Learning Periods for Multisensory Integration in Deep Networks [112.40005682521638]
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training.
We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations.
arXiv Detail & Related papers (2022-10-06T23:50:38Z) - The learning phases in NN: From Fitting the Majority to Fitting a Few [2.5991265608180396]
We analyze a layer's reconstruction ability of the input and prediction performance based on the evolution of parameters during training.
We also assess the behavior using common datasets and architectures from computer vision such as ResNet and VGG.
arXiv Detail & Related papers (2022-02-16T19:11:42Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network.
Our motivation is that models are expected to understand complex dynamics from different sources.
Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z) - On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.
We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time.
We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.