Fit without fear: remarkable mathematical phenomena of deep learning
through the prism of interpolation
- URL: http://arxiv.org/abs/2105.14368v1
- Date: Sat, 29 May 2021 20:15:53 GMT
- Title: Fit without fear: remarkable mathematical phenomena of deep learning
through the prism of interpolation
- Authors: Mikhail Belkin
- Abstract summary: In the past decade the mathematical theory of machine learning has lagged far behind the triumphs of deep neural networks on practical challenges.
I will attempt to assemble some pieces of the remarkable and still incomplete mathematical mosaic emerging from the efforts to understand the foundations of deep learning.
- Score: 22.24486833887627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the past decade the mathematical theory of machine learning has lagged far
behind the triumphs of deep neural networks on practical challenges. However,
the gap between theory and practice is gradually starting to close. In this
paper I will attempt to assemble some pieces of the remarkable and still
incomplete mathematical mosaic emerging from the efforts to understand the
foundations of deep learning. The two key themes will be interpolation, and its
sibling, over-parameterization. Interpolation corresponds to fitting data, even
noisy data, exactly. Over-parameterization enables interpolation and provides
flexibility to select a right interpolating model.
As we will see, just as a physical prism separates colors mixed within a ray
of light, the figurative prism of interpolation helps to disentangle
generalization and optimization properties within the complex picture of modern
Machine Learning. This article is written with belief and hope that clearer
understanding of these issues brings us a step closer toward a general theory
of deep learning and machine learning.
Related papers
- A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time"
It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z) - Neural Causal Abstractions [63.21695740637627]
We develop a new family of causal abstractions by clustering variables and their domains.
We show that such abstractions are learnable in practical settings through Neural Causal Models.
Our experiments support the theory and illustrate how to scale causal inferences to high-dimensional settings involving image data.
arXiv Detail & Related papers (2024-01-05T02:00:27Z) - Breaking the Curse of Dimensionality in Deep Neural Networks by Learning
Invariant Representations [1.9580473532948401]
This thesis explores the theoretical foundations of deep learning by studying the relationship between the architecture of these models and the inherent structures found within the data they process.
We ask What drives the efficacy of deep learning algorithms and allows them to beat the so-called curse of dimensionality.
Our methodology takes an empirical approach to deep learning, combining experimental studies with physics-inspired toy models.
arXiv Detail & Related papers (2023-10-24T19:50:41Z) - A Message Passing Perspective on Learning Dynamics of Contrastive
Learning [60.217972614379065]
We show that if we cast a contrastive objective equivalently into the feature space, then its learning dynamics admits an interpretable form.
This perspective also establishes an intriguing connection between contrastive learning and Message Passing Graph Neural Networks (MP-GNNs)
arXiv Detail & Related papers (2023-03-08T08:27:31Z) - How Do Transformers Learn Topic Structure: Towards a Mechanistic
Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure"
We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z) - Beyond spectral gap (extended): The role of the topology in
decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model.
Current theory does not explain that collaboration enables larger learning rates than training alone.
This paper aims to paint an accurate picture of sparsely-connected distributed optimization.
arXiv Detail & Related papers (2023-01-05T16:53:38Z) - Envisioning Future Deep Learning Theories: Some Basic Concepts and Characteristics [30.365274034429508]
We argue that a future deep learning theory should inherit three characteristics: a textitarchhierically structured network architecture, parameters textititeratively optimized using gradient-based methods, and information from the data that evolves textitcompressively
We integrate these characteristics into a graphical model called textitneurashed, which effectively explains some common empirical patterns in deep learning.
arXiv Detail & Related papers (2021-12-17T19:51:26Z) - Tensor Methods in Computer Vision and Deep Learning [120.3881619902096]
tensors, or multidimensional arrays, are data structures that can naturally represent visual data of multiple dimensions.
With the advent of the deep learning paradigm shift in computer vision, tensors have become even more fundamental.
This article provides an in-depth and practical review of tensors and tensor methods in the context of representation learning and deep learning.
arXiv Detail & Related papers (2021-07-07T18:42:45Z) - Unitary Learning for Deep Diffractive Neural Network [0.0]
We present a unitary learning protocol on deep diffractive neural network.
The temporal-space evolution characteristic in unitary learning is formulated and elucidated.
As a preliminary application, deep diffractive neural network with unitary learning is tentatively implemented on the 2D classification and verification tasks.
arXiv Detail & Related papers (2020-08-17T07:16:09Z) - DeepNNK: Explaining deep models and their generalization using polytope
interpolation [42.16401154367232]
We take a step towards better understanding of neural networks by introducing a local polytopegenerative method.
The proposed Deep Non Negative Kernel regression (NNK) framework is nongenerative, theoretically simple and geometrically intuitive.
arXiv Detail & Related papers (2020-07-20T22:05:24Z) - Exploiting Contextual Information with Deep Neural Networks [5.787117733071416]
We show that contextual information can be exploited in 2 fundamentally different ways: implicitly and explicitly.
In this thesis, we show that contextual information can be exploited in 2 fundamentally different ways: implicitly and explicitly.
arXiv Detail & Related papers (2020-06-21T03:40:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.