Related papers: A Study of the Mathematics of Deep Learning

A Study of the Mathematics of Deep Learning

URL: http://arxiv.org/abs/2104.14033v1
Date: Wed, 28 Apr 2021 22:05:54 GMT
Title: A Study of the Mathematics of Deep Learning
Authors: Anirbit Mukherjee
Abstract summary: "Deep Learning"/"Deep Neural Nets" is a technological marvel that is now increasingly deployed at the cutting-edge of artificial intelligence tasks. This thesis takes several steps towards building strong theoretical foundations for these new paradigms of deep-learning.
Score: 1.14219428942199
License: http://creativecommons.org/licenses/by/4.0/
Abstract: "Deep Learning"/"Deep Neural Nets" is a technological marvel that is now increasingly deployed at the cutting-edge of artificial intelligence tasks. This dramatic success of deep learning in the last few years has been hinged on an enormous amount of heuristics and it has turned out to be a serious mathematical challenge to be able to rigorously explain them. In this thesis, submitted to the Department of Applied Mathematics and Statistics, Johns Hopkins University we take several steps towards building strong theoretical foundations for these new paradigms of deep-learning. In chapter 2 we show new circuit complexity theorems for deep neural functions and prove classification theorems about these function spaces which in turn lead to exact algorithms for empirical risk minimization for depth 2 ReLU nets. We also motivate a measure of complexity of neural functions to constructively establish the existence of high-complexity neural functions. In chapter 3 we give the first algorithm which can train a ReLU gate in the realizable setting in linear time in an almost distribution free set up. In chapter 4 we give rigorous proofs towards explaining the phenomenon of autoencoders being able to do sparse-coding. In chapter 5 we give the first-of-its-kind proofs of convergence for stochastic and deterministic versions of the widely used adaptive gradient deep-learning algorithms, RMSProp and ADAM. This chapter also includes a detailed empirical study on autoencoders of the hyper-parameter values at which modern algorithms have a significant advantage over classical acceleration based methods. In the last chapter 6 we give new and improved PAC-Bayesian bounds for the risk of stochastic neural nets. This chapter also includes an experimental investigation revealing new geometric properties of the paths in weight space that are traced out by the net during the training.

Related papers

Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning [78.88684753303794]
Deep learning has predominantly advanced through applications in computer vision and natural language processing.<n>Neural operators are a principled way to generalize neural networks to mappings between function spaces.<n>This paper identifies and distills the key principles for constructing practical implementations of mappings between infinite-dimensional function spaces.
arXiv Detail & Related papers (2025-06-12T17:59:31Z)
Experimental neuromorphic computing based on quantum memristor [0.2618499987393917]
We report the first neuromorphic architecture based on a photonic quantum memristor. We show how the memristive feedback loop enhances the non-linearity and hence the performance of the algorithm.
arXiv Detail & Related papers (2025-04-25T21:03:19Z)
Artificial Neural Network and Deep Learning: Fundamentals and Theory [0.0]
This book lays a solid groundwork for understanding data and probability distributions. The book delves into multilayer feed-forward neural networks, explaining their architecture, training processes, and the backpropagation algorithm. The text covers various learning rate schedules and adaptive algorithms, providing strategies to optimize the training process.
arXiv Detail & Related papers (2024-08-12T21:06:59Z)
Reasoning Algorithmically in Graph Neural Networks [1.8130068086063336]
We aim to integrate the structured and rule-based reasoning of algorithms with adaptive learning capabilities of neural networks. This dissertation provides theoretical and practical contributions to this area of research.
arXiv Detail & Related papers (2024-02-21T12:16:51Z)
ShadowNet for Data-Centric Quantum System Learning [188.683909185536]
We propose a data-centric learning paradigm combining the strength of neural-network protocols and classical shadows. Capitalizing on the generalization power of neural networks, this paradigm can be trained offline and excel at predicting previously unseen systems. We present the instantiation of our paradigm in quantum state tomography and direct fidelity estimation tasks and conduct numerical analysis up to 60 qubits.
arXiv Detail & Related papers (2023-08-22T09:11:53Z)
Brain-inspired Computational Intelligence via Predictive Coding [73.42407863671565]
Predictive coding (PC) has shown promising properties that make it potentially valuable for the machine learning community.<n>PC-like algorithms are starting to be present in multiple sub-fields of machine learning and AI at large.
arXiv Detail & Related papers (2023-08-15T16:37:16Z)
The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks [59.26515696183751]
We show that algorithm discovery in neural networks is sometimes more complex. We show that even simple learning problems can admit a surprising diversity of solutions.
arXiv Detail & Related papers (2023-06-30T17:59:13Z)
A Unified Algebraic Perspective on Lipschitz Neural Networks [88.14073994459586]
This paper introduces a novel perspective unifying various types of 1-Lipschitz neural networks. We show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition. Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers.
arXiv Detail & Related papers (2023-03-06T14:31:09Z)
Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics [77.34726150561087]
Recent developments in artificial neural networks, particularly deep learning (DL), are reviewed in detail. Both hybrid and pure machine learning (ML) methods are discussed. History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics.
arXiv Detail & Related papers (2022-12-18T02:03:00Z)
Information Flow in Deep Neural Networks [0.6922389632860545]
There is no comprehensive theoretical understanding of how deep neural networks work or are structured. Deep networks are often seen as black boxes with unclear interpretations and reliability. This work aims to apply principles and techniques from information theory to deep learning models to increase our theoretical understanding and design better algorithms.
arXiv Detail & Related papers (2022-02-10T23:32:26Z)
Error Bounds for a Matrix-Vector Product Approximation with Deep ReLU Neural Networks [0.0]
Theory of deep learning has spurred the theory of deep learning-oriented depth and breadth of developments. Motivated by such developments, we pose fundamental questions: can we accurately approximate an arbitrary matrix-vector product using deep rectified linear unit (ReLU) feedforward neural networks (FNNs)? We derive error bounds in Lebesgue and Sobolev norms that comprise our developed deep approximation theory. The developed theory is also applicable for guiding and easing the training of teacher deep ReLU FNNs in view of the emerging teacher-student AI or ML paradigms.
arXiv Detail & Related papers (2021-11-25T08:14:55Z)
Neural networks with linear threshold activations: structure and algorithms [1.795561427808824]
We show that 2 hidden layers are necessary and sufficient to represent any function representable in the class. We also give precise bounds on the sizes of the neural networks required to represent any function in the class. We propose a new class of neural networks that we call shortcut linear threshold networks.
arXiv Detail & Related papers (2021-11-15T22:33:52Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch [76.83052807776276]
We show that it is possible to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks. We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space. We believe these preliminary successes in discovering machine learning algorithms from scratch indicate a promising new direction in the field.
arXiv Detail & Related papers (2020-03-06T19:00:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.