Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning
- URL: http://arxiv.org/abs/2408.13767v2
- Date: Wed, 6 Nov 2024 15:02:37 GMT
- Title: Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning
- Authors: Nadav Cohen, Noam Razin,
- Abstract summary: Notes are based on a lecture delivered by NC on March 2021, as part of an advanced course in Princeton University on the mathematical understanding of deep learning.
They present a theory (developed by NC, NR and collaborators) of linear neural networks -- a fundamental model in the study of optimization and generalization in deep learning.
- Score: 14.909298522361306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: These notes are based on a lecture delivered by NC on March 2021, as part of an advanced course in Princeton University on the mathematical understanding of deep learning. They present a theory (developed by NC, NR and collaborators) of linear neural networks -- a fundamental model in the study of optimization and generalization in deep learning. Practical applications born from the presented theory are also discussed. The theory is based on mathematical tools that are dynamical in nature. It showcases the potential of such tools to push the envelope of our understanding of optimization and generalization in deep learning. The text assumes familiarity with the basics of statistical learning theory. Exercises (without solutions) are included.
Related papers
- A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time"
It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z) - Artificial Neural Network and Deep Learning: Fundamentals and Theory [0.0]
This book lays a solid groundwork for understanding data and probability distributions.
The book delves into multilayer feed-forward neural networks, explaining their architecture, training processes, and the backpropagation algorithm.
The text covers various learning rate schedules and adaptive algorithms, providing strategies to optimize the training process.
arXiv Detail & Related papers (2024-08-12T21:06:59Z) - Understanding Deep Learning via Notions of Rank [5.439020425819001]
This thesis puts forth notions of rank as key for developing a theory of deep learning.
In particular, we establish that gradient-based training can induce an implicit regularization towards low rank for several neural network architectures.
Practical implications of our theory for designing explicit regularization schemes and data preprocessing algorithms are presented.
arXiv Detail & Related papers (2024-08-04T18:47:55Z) - TASI Lectures on Physics for Machine Learning [0.0]
Notes are based on lectures I gave at TASI 2024 on Physics for Machine Learning.
The focus is on neural network theory, organized according to network expressivity, statistics, and dynamics.
arXiv Detail & Related papers (2024-07-31T18:00:22Z) - Mathematical theory of deep learning [0.46040036610482665]
It covers fundamental results in approximation theory, optimization theory, and statistical learning theory.
The book aims to equip readers with foundational knowledge on the topic.
arXiv Detail & Related papers (2024-07-25T20:37:12Z) - Foundations and Frontiers of Graph Learning Theory [81.39078977407719]
Recent advancements in graph learning have revolutionized the way to understand and analyze data with complex structures.
Graph Neural Networks (GNNs), i.e. neural network architectures designed for learning graph representations, have become a popular paradigm.
This article provides a comprehensive summary of the theoretical foundations and breakthroughs concerning the approximation and learning behaviors intrinsic to prevalent graph learning models.
arXiv Detail & Related papers (2024-07-03T14:07:41Z) - Envisioning Future Deep Learning Theories: Some Basic Concepts and Characteristics [30.365274034429508]
We argue that a future deep learning theory should inherit three characteristics: a textitarchhierically structured network architecture, parameters textititeratively optimized using gradient-based methods, and information from the data that evolves textitcompressively
We integrate these characteristics into a graphical model called textitneurashed, which effectively explains some common empirical patterns in deep learning.
arXiv Detail & Related papers (2021-12-17T19:51:26Z) - Credit Assignment in Neural Networks through Deep Feedback Control [59.14935871979047]
Deep Feedback Control (DFC) is a new learning method that uses a feedback controller to drive a deep neural network to match a desired output target and whose control signal can be used for credit assignment.
The resulting learning rule is fully local in space and time and approximates Gauss-Newton optimization for a wide range of connectivity patterns.
To further underline its biological plausibility, we relate DFC to a multi-compartment model of cortical pyramidal neurons with a local voltage-dependent synaptic plasticity rule, consistent with recent theories of dendritic processing.
arXiv Detail & Related papers (2021-06-15T05:30:17Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Recent advances in deep learning theory [104.01582662336256]
This paper reviews and organizes the recent advances in deep learning theory.
The literature is categorized in six groups: (1) complexity and capacity-based approaches for analysing the generalizability of deep learning; (2) differential equations and their dynamic systems for modelling gradient descent and its variants; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; and (5) theoretical foundations of several special structures in network architectures.
arXiv Detail & Related papers (2020-12-20T14:16:41Z) - A Chain Graph Interpretation of Real-World Neural Networks [58.78692706974121]
We propose an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure.
The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models.
We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques.
arXiv Detail & Related papers (2020-06-30T14:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.