A Study of the Mathematics of Deep Learning
- URL: http://arxiv.org/abs/2104.14033v1
- Date: Wed, 28 Apr 2021 22:05:54 GMT
- Title: A Study of the Mathematics of Deep Learning
- Authors: Anirbit Mukherjee
- Abstract summary: "Deep Learning"/"Deep Neural Nets" is a technological marvel that is now increasingly deployed at the cutting-edge of artificial intelligence tasks.
This thesis takes several steps towards building strong theoretical foundations for these new paradigms of deep-learning.
- Score: 1.14219428942199
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: "Deep Learning"/"Deep Neural Nets" is a technological marvel that is now
increasingly deployed at the cutting-edge of artificial intelligence tasks.
This dramatic success of deep learning in the last few years has been hinged on
an enormous amount of heuristics and it has turned out to be a serious
mathematical challenge to be able to rigorously explain them. In this thesis,
submitted to the Department of Applied Mathematics and Statistics, Johns
Hopkins University we take several steps towards building strong theoretical
foundations for these new paradigms of deep-learning. In chapter 2 we show new
circuit complexity theorems for deep neural functions and prove classification
theorems about these function spaces which in turn lead to exact algorithms for
empirical risk minimization for depth 2 ReLU nets. We also motivate a measure
of complexity of neural functions to constructively establish the existence of
high-complexity neural functions. In chapter 3 we give the first algorithm
which can train a ReLU gate in the realizable setting in linear time in an
almost distribution free set up. In chapter 4 we give rigorous proofs towards
explaining the phenomenon of autoencoders being able to do sparse-coding. In
chapter 5 we give the first-of-its-kind proofs of convergence for stochastic
and deterministic versions of the widely used adaptive gradient deep-learning
algorithms, RMSProp and ADAM. This chapter also includes a detailed empirical
study on autoencoders of the hyper-parameter values at which modern algorithms
have a significant advantage over classical acceleration based methods. In the
last chapter 6 we give new and improved PAC-Bayesian bounds for the risk of
stochastic neural nets. This chapter also includes an experimental
investigation revealing new geometric properties of the paths in weight space
that are traced out by the net during the training.
Related papers
- Artificial Neural Network and Deep Learning: Fundamentals and Theory [0.0]
This book lays a solid groundwork for understanding data and probability distributions.
The book delves into multilayer feed-forward neural networks, explaining their architecture, training processes, and the backpropagation algorithm.
The text covers various learning rate schedules and adaptive algorithms, providing strategies to optimize the training process.
arXiv Detail & Related papers (2024-08-12T21:06:59Z) - Reasoning Algorithmically in Graph Neural Networks [1.8130068086063336]
We aim to integrate the structured and rule-based reasoning of algorithms with adaptive learning capabilities of neural networks.
This dissertation provides theoretical and practical contributions to this area of research.
arXiv Detail & Related papers (2024-02-21T12:16:51Z) - ShadowNet for Data-Centric Quantum System Learning [188.683909185536]
We propose a data-centric learning paradigm combining the strength of neural-network protocols and classical shadows.
Capitalizing on the generalization power of neural networks, this paradigm can be trained offline and excel at predicting previously unseen systems.
We present the instantiation of our paradigm in quantum state tomography and direct fidelity estimation tasks and conduct numerical analysis up to 60 qubits.
arXiv Detail & Related papers (2023-08-22T09:11:53Z) - The Clock and the Pizza: Two Stories in Mechanistic Explanation of
Neural Networks [59.26515696183751]
We show that algorithm discovery in neural networks is sometimes more complex.
We show that even simple learning problems can admit a surprising diversity of solutions.
arXiv Detail & Related papers (2023-06-30T17:59:13Z) - A Unified Algebraic Perspective on Lipschitz Neural Networks [88.14073994459586]
This paper introduces a novel perspective unifying various types of 1-Lipschitz neural networks.
We show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition.
Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers.
arXiv Detail & Related papers (2023-03-06T14:31:09Z) - Deep learning applied to computational mechanics: A comprehensive
review, state of the art, and the classics [77.34726150561087]
Recent developments in artificial neural networks, particularly deep learning (DL), are reviewed in detail.
Both hybrid and pure machine learning (ML) methods are discussed.
History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics.
arXiv Detail & Related papers (2022-12-18T02:03:00Z) - Information Flow in Deep Neural Networks [0.6922389632860545]
There is no comprehensive theoretical understanding of how deep neural networks work or are structured.
Deep networks are often seen as black boxes with unclear interpretations and reliability.
This work aims to apply principles and techniques from information theory to deep learning models to increase our theoretical understanding and design better algorithms.
arXiv Detail & Related papers (2022-02-10T23:32:26Z) - Error Bounds for a Matrix-Vector Product Approximation with Deep ReLU
Neural Networks [0.0]
Theory of deep learning has spurred the theory of deep learning-oriented depth and breadth of developments.
Motivated by such developments, we pose fundamental questions: can we accurately approximate an arbitrary matrix-vector product using deep rectified linear unit (ReLU) feedforward neural networks (FNNs)?
We derive error bounds in Lebesgue and Sobolev norms that comprise our developed deep approximation theory.
The developed theory is also applicable for guiding and easing the training of teacher deep ReLU FNNs in view of the emerging teacher-student AI or ML paradigms.
arXiv Detail & Related papers (2021-11-25T08:14:55Z) - Neural networks with linear threshold activations: structure and
algorithms [1.795561427808824]
We show that 2 hidden layers are necessary and sufficient to represent any function representable in the class.
We also give precise bounds on the sizes of the neural networks required to represent any function in the class.
We propose a new class of neural networks that we call shortcut linear threshold networks.
arXiv Detail & Related papers (2021-11-15T22:33:52Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch [76.83052807776276]
We show that it is possible to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks.
We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space.
We believe these preliminary successes in discovering machine learning algorithms from scratch indicate a promising new direction in the field.
arXiv Detail & Related papers (2020-03-06T19:00:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.