Separation of scales and a thermodynamic description of feature learning
in some CNNs
- URL: http://arxiv.org/abs/2112.15383v1
- Date: Fri, 31 Dec 2021 10:49:55 GMT
- Title: Separation of scales and a thermodynamic description of feature learning
in some CNNs
- Authors: Inbar Seroussi and Zohar Ringel
- Abstract summary: Deep neural networks (DNNs) are powerful tools for compressing and distilling information.
A common strategy in such cases is to identify slow degrees of freedom that average out the erratic behavior of the underlying fast microscopic variables.
Here, we identify such a separation of scales occurring in over- parameterized deep convolutional neural networks (CNNs) at the end of training.
The resulting thermodynamic theory of deep learning yields accurate predictions on several deep non-linear CNN toy models.
- Score: 2.28438857884398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks (DNNs) are powerful tools for compressing and distilling
information. Due to their scale and complexity, often involving billions of
inter-dependent internal degrees of freedom, exact analysis approaches often
fall short. A common strategy in such cases is to identify slow degrees of
freedom that average out the erratic behavior of the underlying fast
microscopic variables. Here, we identify such a separation of scales occurring
in over-parameterized deep convolutional neural networks (CNNs) at the end of
training. It implies that neuron pre-activations fluctuate in a nearly Gaussian
manner with a deterministic latent kernel. While for CNNs with infinitely many
channels these kernels are inert, for finite CNNs they adapt and learn from
data in an analytically tractable manner. The resulting thermodynamic theory of
deep learning yields accurate predictions on several deep non-linear CNN toy
models. In addition, it provides new ways of analyzing and understanding CNNs.
Related papers
- On the rates of convergence for learning with convolutional neural networks [9.772773527230134]
We study approximation and learning capacities of convolutional neural networks (CNNs) with one-side zero-padding and multiple channels.
We derive convergence rates for estimators based on CNNs in many learning problems.
It is also shown that the obtained rates for classification are minimax optimal in some common settings.
arXiv Detail & Related papers (2024-03-25T06:42:02Z) - Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network.
We provide analytical expressions for these speed limits for linear and linearizable neural networks.
Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Interpreting convolutional neural networks' low dimensional
approximation to quantum spin systems [1.631115063641726]
Convolutional neural networks (CNNs) have been employed along with Variational Monte Carlo methods for finding the ground state of quantum many-body spin systems.
We provide a theoretical and experimental analysis of how the CNN optimize learning for spin systems, and investigate the CNN's low dimensional approximation.
Our results allow us to gain a comprehensive, improved understanding of how CNNs successfully approximate quantum spin Hamiltonians.
arXiv Detail & Related papers (2022-10-03T02:49:16Z) - What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime.
We prove that deep CNNs adapt to the spatial scale of the target function.
We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z) - Analytic Learning of Convolutional Neural Network For Pattern
Recognition [20.916630175697065]
Training convolutional neural networks (CNNs) with back-propagation (BP) is time-consuming and resource-intensive.
We propose an analytic convolutional neural network learning (ACnnL)
ACnnL builds a closed-form solution similar to its counterpart, but differs in their regularization constraints.
arXiv Detail & Related papers (2022-02-14T06:32:21Z) - Do All MobileNets Quantize Poorly? Gaining Insights into the Effect of
Quantization on Depthwise Separable Convolutional Networks Through the Eyes
of Multi-scale Distributional Dynamics [93.4221402881609]
MobileNets are the go-to family of deep convolutional neural networks (CNN) for mobile.
They often have significant accuracy degradation under post-training quantization.
We study the multi-scale distributional dynamics of MobileNet-V1, a set of smaller DWSCNNs, and regular CNNs.
arXiv Detail & Related papers (2021-04-24T01:28:29Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - A New Neuromorphic Computing Approach for Epileptic Seizure Prediction [4.798958633851825]
CNNs are computationally expensive and power hungry.
Motivated by the energy-efficient spiking neural networks (SNNs), a neuromorphic computing approach for seizure prediction is proposed in this work.
arXiv Detail & Related papers (2021-02-25T10:39:18Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.