A Mathematical Principle of Deep Learning: Learn the Geodesic Curve in
the Wasserstein Space
- URL: http://arxiv.org/abs/2102.09235v1
- Date: Thu, 18 Feb 2021 09:37:49 GMT
- Title: A Mathematical Principle of Deep Learning: Learn the Geodesic Curve in
the Wasserstein Space
- Authors: Kuo Gai and Shihua Zhang
- Abstract summary: We build the connection of deep neural network (DNN) and dynamic system.
By diving the optimal transport theory, we find DNN with weight decay attempts to learn the geodesic curve in the Wasserstein space.
We conclude a mathematical principle of deep learning is to learn the geodesic curve in the Wasserstein space.
- Score: 2.66512000865131
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies revealed the mathematical connection of deep neural network
(DNN) and dynamic system. However, the fundamental principle of DNN has not
been fully characterized with dynamic system in terms of optimization and
generalization. To this end, we build the connection of DNN and continuity
equation where the measure is conserved to model the forward propagation
process of DNN which has not been addressed before. DNN learns the
transformation of the input distribution to the output one. However, in the
measure space, there are infinite curves connecting two distributions. Which
one can lead to good optimization and generaliztion for DNN? By diving the
optimal transport theory, we find DNN with weight decay attempts to learn the
geodesic curve in the Wasserstein space, which is induced by the optimal
transport map. Compared with plain network, ResNet is a better approximation to
the geodesic curve, which explains why ResNet can be optimized and generalize
better. Numerical experiments show that the data tracks of both plain network
and ResNet tend to be line-shape in term of line-shape score (LSS), and the map
learned by ResNet is closer to the optimal transport map in term of optimal
transport score (OTS). In a word, we conclude a mathematical principle of deep
learning is to learn the geodesic curve in the Wasserstein space; and deep
learning is a great engineering realization of continuous transformation in
high-dimensional space.
Related papers
- Mathematical Modeling and Convergence Analysis of Deep Neural Networks with Dense Layer Connectivities in Deep Learning [1.5516092077598485]
In deep learning, dense layer connectivity has become a key design principle in deep neural networks (DNNs)<n>In this work, we model densely connected DNNs mathematically and analyze their learning problems in the deep-layer limit.
arXiv Detail & Related papers (2025-10-02T14:22:51Z) - Deep Learning as Ricci Flow [38.27936710747996]
Deep neural networks (DNNs) are powerful tools for approximating the distribution of complex data.
We show that the transformations performed by DNNs during classification tasks have parallels to those expected under Hamilton's Ricci flow.
Our findings motivate the use of tools from differential and discrete geometry to the problem of explainability in deep learning.
arXiv Detail & Related papers (2024-04-22T15:12:47Z) - Deep Networks Always Grok and Here is Why [15.327649172531606]
Grokking, or delayed generalization, is a phenomenon where generalization in a deep neural network (DNN) occurs long after achieving near zero training error.
We demonstrate that grokking is actually much more widespread and materializes in a wide range of practical settings.
arXiv Detail & Related papers (2024-02-23T18:59:31Z) - From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport [32.39176908225668]
We introduce the concept of the non-linearity signature of DNN, the first theoretically sound solution for measuring the non-linearity of deep neural networks.
We provide extensive experimental results that highlight the practical usefulness of the proposed non-linearity signature.
arXiv Detail & Related papers (2023-10-17T17:50:22Z) - Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network.
We provide analytical expressions for these speed limits for linear and linearizable neural networks.
Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z) - Universal Neural Optimal Transport [0.0]
UNOT (Universal Neural Optimal Transport) is a novel framework capable of accurately predicting (entropic) OT distances and plans between discrete measures for a given cost function.<n>We show that our network can be used as a state-of-the-art initialization for the Sinkhorn algorithm with speedups of up to $7.4times$.
arXiv Detail & Related papers (2022-11-30T21:56:09Z) - Analysis of Convolutions, Non-linearity and Depth in Graph Neural
Networks using Neural Tangent Kernel [8.824340350342512]
Graph Neural Networks (GNNs) are designed to exploit the structural information of the data by aggregating the neighboring nodes.
We theoretically analyze the influence of different aspects of the GNN architecture using the Graph Neural Kernel in a semi-supervised node classification setting.
We prove that: (i) linear networks capture the class information as good as ReLU networks; (ii) row normalization preserves the underlying class structure better than other convolutions; (iii) performance degrades with network depth due to over-smoothing; (iv) skip connections retain the class information even at infinite depth, thereby eliminating over-smooth
arXiv Detail & Related papers (2022-10-18T12:28:37Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Wide and Deep Graph Neural Network with Distributed Online Learning [174.8221510182559]
Graph neural networks (GNNs) are naturally distributed architectures for learning representations from network data.
Online learning can be leveraged to retrain GNNs at testing time to overcome this issue.
This paper develops the Wide and Deep GNN (WD-GNN), a novel architecture that can be updated with distributed online learning mechanisms.
arXiv Detail & Related papers (2021-07-19T23:56:48Z) - Optimization of Graph Neural Networks: Implicit Acceleration by Skip
Connections and More Depth [57.10183643449905]
Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization.
We study the dynamics of GNNs by studying deep skip optimization.
Our results provide first theoretical support for the success of GNNs.
arXiv Detail & Related papers (2021-05-10T17:59:01Z) - Fast Learning of Graph Neural Networks with Guaranteed Generalizability:
One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice.
We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z) - Fractional Deep Neural Network via Constrained Optimization [0.0]
This paper introduces a novel algorithmic framework for a deep neural network (DNN)
Fractional-DNN can be viewed as a time-discretization of a fractional in time nonlinear ordinary differential equation (ODE)
arXiv Detail & Related papers (2020-04-01T21:58:21Z) - Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations.
Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization.
It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.