Related papers: Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training

Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training

URL: http://arxiv.org/abs/2012.15110v1
Date: Wed, 30 Dec 2020 11:00:36 GMT
Title: Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training
Authors: Mario Geiger, Leonardo Petrini and Matthieu Wyart
Abstract summary: Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension -- a feat generically impossible. We argue that different learning regimes can be organized into a phase diagram.
Score: 4.318555434063275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension -- a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure, symmetry or invariance makes data such as images learnable is a fundamental challenge. Other puzzles include that (i) learning corresponds to minimizing a loss in high dimension, which is in general not convex and could well get stuck bad minima. (ii) Deep learning predicting power increases with the number of fitting parameters, even in a regime where data are perfectly fitted. In this manuscript, we review recent results elucidating (i,ii) and the perspective they offer on the (still unexplained) curse of dimensionality paradox. We base our theoretical discussion on the $(h,\alpha)$ plane where $h$ is the network width and $\alpha$ the scale of the output of the network at initialization, and provide new systematic measures of performance in that plane for MNIST and CIFAR 10. We argue that different learning regimes can be organized into a phase diagram. A line of critical points sharply delimits an under-parametrised phase from an over-parametrized one. In over-parametrized nets, learning can operate in two regimes separated by a smooth cross-over. At large initialization, it corresponds to a kernel method, whereas for small initializations features can be learnt, together with invariants in the data. We review the properties of these different phases, of the transition separating them and some open questions. Our treatment emphasizes analogies with physical systems, scaling arguments and the development of numerical observables to quantitatively test these results empirically.

Related papers

Deep Multi-Task Learning Has Low Amortized Intrinsic Dimensionality [15.621144215664769]
We introduce a method to parametrize multi-task network directly in the low-dimensional space. We show that high-accuracy multi-task solutions can be found with much smaller intrinsic dimensionality than what single-task learning requires.
arXiv Detail & Related papers (2025-01-31T11:53:16Z)
Rethink Deep Learning with Invariance in Data Representation [23.49898692565483]
Invariant design has been the cornerstone of various representations in the era before deep learning. In this tutorial, we will give a historical perspective of the invariance in data representations. We will identify those research dilemmas, promising works, future directions, and web applications.
arXiv Detail & Related papers (2024-12-06T08:52:26Z)
Disentangled Representation Learning with the Gromov-Monge Gap [65.73194652234848]
Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning. We introduce a novel approach to disentangled representation learning based on quadratic optimal transport. We demonstrate the effectiveness of our approach for quantifying disentanglement across four standard benchmarks.
arXiv Detail & Related papers (2024-07-10T16:51:32Z)
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions [20.036783417617652]
We investigate the training dynamics of two-layer shallow neural networks trained with gradient-based algorithms. We show that a simple modification of the idealized single-pass gradient descent training scenario drastically improves its computational efficiency. Our results highlight the ability of networks to learn relevant structures from data alone without any pre-processing.
arXiv Detail & Related papers (2024-05-24T11:34:31Z)
Super Consistency of Neural Network Landscapes and Learning Rate Transfer [72.54450821671624]
We study the landscape through the lens of the loss Hessian. We find that certain spectral properties under $mu$P are largely independent of the size of the network. We show that in the Neural Tangent Kernel (NTK) and other scaling regimes, the sharpness exhibits very different dynamics at different scales.
arXiv Detail & Related papers (2024-02-27T12:28:01Z)
Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm [33.59320315666675]
We study the triplet (D, M, I) as an integrated system and identify important synergies that help mitigate the curse of dimensionality. We find that learning is most efficient when these symmetries are compatible with those of the data distribution.
arXiv Detail & Related papers (2022-07-11T04:08:21Z)
Learning sparse features can lead to overfitting in neural networks [9.2104922520782]
We show that feature learning can perform worse than lazy training. Although sparsity is known to be essential for learning anisotropic data, it is detrimental when the target function is constant or smooth.
arXiv Detail & Related papers (2022-06-24T14:26:33Z)
Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances. Online descent (OGD) is a popular approach to handle streaming data in pairwise learning. In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z)
High-dimensional separability for one- and few-shot learning [58.8599521537]
This work is driven by a practical question, corrections of Artificial Intelligence (AI) errors. Special external devices, correctors, are developed. They should provide quick and non-iterative system fix without modification of a legacy AI system. New multi-correctors of AI systems are presented and illustrated with examples of predicting errors and learning new classes of objects by a deep convolutional neural network.
arXiv Detail & Related papers (2021-06-28T14:58:14Z)
Unsupervised mapping of phase diagrams of 2D systems from infinite projected entangled-pair states via deep anomaly detection [0.0]
We demonstrate how to map out the phase diagram of a two dimensional quantum many body system with no prior physical knowledge. As a benchmark, the phase diagram of the 2D frustrated bilayer Heisenberg model is analyzed. We show that in order to get a good qualitative picture of the transition lines, it suffices to use data from the cost-efficient simple update optimization.
arXiv Detail & Related papers (2021-05-19T12:19:20Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
Recurrent Multi-view Alignment Network for Unsupervised Surface Registration [79.72086524370819]
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data. We propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations. We also introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images.
arXiv Detail & Related papers (2020-11-24T14:22:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.