Related papers: Understanding and Leveraging the Learning Phases of Neural Networks

Understanding and Leveraging the Learning Phases of Neural Networks

URL: http://arxiv.org/abs/2312.06887v2
Date: Thu, 14 Dec 2023 14:32:45 GMT
Title: Understanding and Leveraging the Learning Phases of Neural Networks
Authors: Johannes Schneider and Mohit Prabhushankar
Abstract summary: The learning dynamics of deep neural networks are not well understood. We comprehensively analyze the learning dynamics by investigating a layer's reconstruction ability of the input and prediction performance. We show the existence of three phases using common datasets and architectures such as ResNet and VGG.
Score: 7.1169582271841625
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The learning dynamics of deep neural networks are not well understood. The information bottleneck (IB) theory proclaimed separate fitting and compression phases. But they have since been heavily debated. We comprehensively analyze the learning dynamics by investigating a layer's reconstruction ability of the input and prediction performance based on the evolution of parameters during training. We empirically show the existence of three phases using common datasets and architectures such as ResNet and VGG: (i) near constant reconstruction loss, (ii) decrease, and (iii) increase. We also derive an empirically grounded data model and prove the existence of phases for single-layer networks. Technically, our approach leverages classical complexity analysis. It differs from IB by relying on measuring reconstruction loss rather than information theoretic measures to relate information of intermediate layers and inputs. Our work implies a new best practice for transfer learning: We show empirically that the pre-training of a classifier should stop well before its performance is optimal.

Related papers

Binarized Neural Networks Converge Toward Algorithmic Simplicity: Empirical Support for the Learning-as-Compression Hypothesis [36.24954635616374]
We propose a shift toward algorithmic information theory, using Binarized Neural Networks (BNNs) as a first proxy.<n>We apply the Block Decomposition Method (BDM) and demonstrate it more closely tracks structural changes during training than entropy.<n>These results support the view of training as a process of algorithmic compression, where learning corresponds to the progressive internalization of structured regularities.
arXiv Detail & Related papers (2025-05-27T02:51:36Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification [3.0398616939692777]
Techniques like adversarial learning, contrastive learning, diffusion denoising learning, and ordinary reconstruction learning have become standard. The study aims to elucidate the advantages of pre-training techniques and fine-tuning strategies to enhance the learning process of neural networks.
arXiv Detail & Related papers (2024-05-29T15:44:51Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
The mechanistic basis of data dependence and abrupt learning in an in-context classification task [0.3626013617212666]
We show that specific distributional properties inherent in language control the trade-off or simultaneous appearance of two forms of learning. In-context learning is driven by the abrupt emergence of an induction head, which subsequently competes with in-weights learning. We propose that the sharp transitions in attention-based networks arise due to a specific chain of multi-layer operations necessary to achieve ICL.
arXiv Detail & Related papers (2023-12-03T20:53:41Z)
Critical Learning Periods for Multisensory Integration in Deep Networks [112.40005682521638]
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training. We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations.
arXiv Detail & Related papers (2022-10-06T23:50:38Z)
The learning phases in NN: From Fitting the Majority to Fitting a Few [2.5991265608180396]
We analyze a layer's reconstruction ability of the input and prediction performance based on the evolution of parameters during training. We also assess the behavior using common datasets and architectures from computer vision such as ResNet and VGG.
arXiv Detail & Related papers (2022-02-16T19:11:42Z)
With Greater Distance Comes Worse Performance: On the Perspective of Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning. Early layers generally learn representations relevant to performance on both training data and testing data. Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)
Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks. We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator. To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z)
On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts. We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time. We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.