Recent advances in deep learning theory
- URL: http://arxiv.org/abs/2012.10931v2
- Date: Thu, 11 Mar 2021 09:16:45 GMT
- Title: Recent advances in deep learning theory
- Authors: Fengxiang He, Dacheng Tao
- Abstract summary: This paper reviews and organizes the recent advances in deep learning theory.
The literature is categorized in six groups: (1) complexity and capacity-based approaches for analysing the generalizability of deep learning; (2) differential equations and their dynamic systems for modelling gradient descent and its variants; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; and (5) theoretical foundations of several special structures in network architectures.
- Score: 104.01582662336256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning is usually described as an experiment-driven field under
continuous criticizes of lacking theoretical foundations. This problem has been
partially fixed by a large volume of literature which has so far not been well
organized. This paper reviews and organizes the recent advances in deep
learning theory. The literature is categorized in six groups: (1) complexity
and capacity-based approaches for analyzing the generalizability of deep
learning; (2) stochastic differential equations and their dynamic systems for
modelling stochastic gradient descent and its variants, which characterize the
optimization and generalization of deep learning, partially inspired by
Bayesian inference; (3) the geometrical structures of the loss landscape that
drives the trajectories of the dynamic systems; (4) the roles of
over-parameterization of deep neural networks from both positive and negative
perspectives; (5) theoretical foundations of several special structures in
network architectures; and (6) the increasingly intensive concerns in ethics
and security and their relationships with generalizability.
Related papers
- Category-Theoretical and Topos-Theoretical Frameworks in Machine Learning: A Survey [4.686566164138397]
We provide an overview of category theory-derived machine learning from four mainstream perspectives.
For the first three topics, we primarily review research in the past five years, updating and expanding on the previous survey.
The fourth topic, which delves into higher category theory, particularly topos theory, is surveyed for the first time in this paper.
arXiv Detail & Related papers (2024-08-26T04:39:33Z) - Foundations and Frontiers of Graph Learning Theory [81.39078977407719]
Recent advancements in graph learning have revolutionized the way to understand and analyze data with complex structures.
Graph Neural Networks (GNNs), i.e. neural network architectures designed for learning graph representations, have become a popular paradigm.
This article provides a comprehensive summary of the theoretical foundations and breakthroughs concerning the approximation and learning behaviors intrinsic to prevalent graph learning models.
arXiv Detail & Related papers (2024-07-03T14:07:41Z) - Fundamental Components of Deep Learning: A category-theoretic approach [0.0]
This thesis develops a novel mathematical foundation for deep learning based on the language of category theory.
We also systematise many existing approaches, placing many existing constructions and concepts under the same umbrella.
arXiv Detail & Related papers (2024-03-13T01:29:40Z) - Envisioning Future Deep Learning Theories: Some Basic Concepts and Characteristics [30.365274034429508]
We argue that a future deep learning theory should inherit three characteristics: a textitarchhierically structured network architecture, parameters textititeratively optimized using gradient-based methods, and information from the data that evolves textitcompressively
We integrate these characteristics into a graphical model called textitneurashed, which effectively explains some common empirical patterns in deep learning.
arXiv Detail & Related papers (2021-12-17T19:51:26Z) - Unified Field Theory for Deep and Recurrent Neural Networks [56.735884560668985]
We present a unified and systematic derivation of the mean-field theory for both recurrent and deep networks.
We find that convergence towards the mean-field theory is typically slower for recurrent networks than for deep networks.
Our method exposes that Gaussian processes are but the lowest order of a systematic expansion in $1/n$.
arXiv Detail & Related papers (2021-12-10T15:06:11Z) - Analytic Insights into Structure and Rank of Neural Network Hessian Maps [32.90143789616052]
Hessian of a neural network captures parameter interactions through second-order derivatives of the loss.
We develop theoretical tools to analyze the range of the Hessian map, providing us with a precise understanding of its rank deficiency.
This yields exact formulas and tight upper bounds for the Hessian rank of deep linear networks.
arXiv Detail & Related papers (2021-06-30T17:29:58Z) - Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches.
The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data.
The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z) - A Chain Graph Interpretation of Real-World Neural Networks [58.78692706974121]
We propose an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure.
The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models.
We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques.
arXiv Detail & Related papers (2020-06-30T14:46:08Z) - Understanding Deep Architectures with Reasoning Layer [60.90906477693774]
We show that properties of the algorithm layers, such as convergence, stability, and sensitivity, are intimately related to the approximation and generalization abilities of the end-to-end model.
Our theory can provide useful guidelines for designing deep architectures with reasoning layers.
arXiv Detail & Related papers (2020-06-24T00:26:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.