Related papers: Separation of scales and a thermodynamic description of feature learning in some CNNs

Separation of scales and a thermodynamic description of feature learning in some CNNs

URL: http://arxiv.org/abs/2112.15383v1
Date: Fri, 31 Dec 2021 10:49:55 GMT
Title: Separation of scales and a thermodynamic description of feature learning in some CNNs
Authors: Inbar Seroussi and Zohar Ringel
Abstract summary: Deep neural networks (DNNs) are powerful tools for compressing and distilling information. A common strategy in such cases is to identify slow degrees of freedom that average out the erratic behavior of the underlying fast microscopic variables. Here, we identify such a separation of scales occurring in over- parameterized deep convolutional neural networks (CNNs) at the end of training. The resulting thermodynamic theory of deep learning yields accurate predictions on several deep non-linear CNN toy models.
Score: 2.28438857884398
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep neural networks (DNNs) are powerful tools for compressing and distilling information. Due to their scale and complexity, often involving billions of inter-dependent internal degrees of freedom, exact analysis approaches often fall short. A common strategy in such cases is to identify slow degrees of freedom that average out the erratic behavior of the underlying fast microscopic variables. Here, we identify such a separation of scales occurring in over-parameterized deep convolutional neural networks (CNNs) at the end of training. It implies that neuron pre-activations fluctuate in a nearly Gaussian manner with a deterministic latent kernel. While for CNNs with infinitely many channels these kernels are inert, for finite CNNs they adapt and learn from data in an analytically tractable manner. The resulting thermodynamic theory of deep learning yields accurate predictions on several deep non-linear CNN toy models. In addition, it provides new ways of analyzing and understanding CNNs.

Related papers

CNN2GNN: How to Bridge CNN with GNN [59.42117676779735]
We propose a novel CNN2GNN framework to unify CNN and GNN together via distillation. The performance of distilled boosted'' two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers such as ResNet152.
arXiv Detail & Related papers (2024-04-23T08:19:08Z)
On the rates of convergence for learning with convolutional neural networks [9.772773527230134]
We study approximation and learning capacities of convolutional neural networks (CNNs) with one-side zero-padding and multiple channels. We derive convergence rates for estimators based on CNNs in many learning problems. It is also shown that the obtained rates for classification are minimax optimal in some common settings.
arXiv Detail & Related papers (2024-03-25T06:42:02Z)
Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network. We provide analytical expressions for these speed limits for linear and linearizable neural networks. Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
Interpreting convolutional neural networks' low dimensional approximation to quantum spin systems [1.631115063641726]
Convolutional neural networks (CNNs) have been employed along with Variational Monte Carlo methods for finding the ground state of quantum many-body spin systems. We provide a theoretical and experimental analysis of how the CNN optimize learning for spin systems, and investigate the CNN's low dimensional approximation. Our results allow us to gain a comprehensive, improved understanding of how CNNs successfully approximate quantum spin Hamiltonians.
arXiv Detail & Related papers (2022-10-03T02:49:16Z)
What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime. We prove that deep CNNs adapt to the spatial scale of the target function. We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z)
Analytic Learning of Convolutional Neural Network For Pattern Recognition [20.916630175697065]
Training convolutional neural networks (CNNs) with back-propagation (BP) is time-consuming and resource-intensive. We propose an analytic convolutional neural network learning (ACnnL) ACnnL builds a closed-form solution similar to its counterpart, but differs in their regularization constraints.
arXiv Detail & Related papers (2022-02-14T06:32:21Z)
Do All MobileNets Quantize Poorly? Gaining Insights into the Effect of Quantization on Depthwise Separable Convolutional Networks Through the Eyes of Multi-scale Distributional Dynamics [93.4221402881609]
MobileNets are the go-to family of deep convolutional neural networks (CNN) for mobile. They often have significant accuracy degradation under post-training quantization. We study the multi-scale distributional dynamics of MobileNet-V1, a set of smaller DWSCNNs, and regular CNNs.
arXiv Detail & Related papers (2021-04-24T01:28:29Z)
BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks. Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z)
A New Neuromorphic Computing Approach for Epileptic Seizure Prediction [4.798958633851825]
CNNs are computationally expensive and power hungry. Motivated by the energy-efficient spiking neural networks (SNNs), a neuromorphic computing approach for seizure prediction is proposed in this work.
arXiv Detail & Related papers (2021-02-25T10:39:18Z)
Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes. We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.