Related papers: Expressive Power and Loss Surfaces of Deep Learning Models

Expressive Power and Loss Surfaces of Deep Learning Models

URL: http://arxiv.org/abs/2108.03579v2
Date: Tue, 10 Aug 2021 01:34:42 GMT
Title: Expressive Power and Loss Surfaces of Deep Learning Models
Authors: Simant Dube
Abstract summary: This paper serves as an expository tutorial on the working of deep learning models. The second goal is to complement the current results on the expressive power of deep learning models with novel insights and results.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The goals of this paper are two-fold. The first goal is to serve as an expository tutorial on the working of deep learning models which emphasizes geometrical intuition about the reasons for success of deep learning. The second goal is to complement the current results on the expressive power of deep learning models and their loss surfaces with novel insights and results. In particular, we describe how deep neural networks carve out manifolds especially when the multiplication neurons are introduced. Multiplication is used in dot products and the attention mechanism and it is employed in capsule networks and self-attention based transformers. We also describe how random polynomial, random matrix, spin glass and computational complexity perspectives on the loss surfaces are interconnected.

Related papers

Generating visual explanations from deep networks using implicit neural representations [0.6056822594090163]
In this work, we demonstrate that implicit neural representations (INRs) constitute a good framework for generating visual explanations. We present an iterative INR-based method that can be used to generate multiple non-overlapping attribution masks for the same image.
arXiv Detail & Related papers (2025-01-20T23:17:57Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
Breaking the Curse of Dimensionality in Deep Neural Networks by Learning Invariant Representations [1.9580473532948401]
This thesis explores the theoretical foundations of deep learning by studying the relationship between the architecture of these models and the inherent structures found within the data they process. We ask What drives the efficacy of deep learning algorithms and allows them to beat the so-called curse of dimensionality. Our methodology takes an empirical approach to deep learning, combining experimental studies with physics-inspired toy models.
arXiv Detail & Related papers (2023-10-24T19:50:41Z)
Riemannian Residual Neural Networks [58.925132597945634]
We show how to extend the residual neural network (ResNet) ResNets have become ubiquitous in machine learning due to their beneficial learning properties, excellent empirical results, and easy-to-incorporate nature when building varied neural networks.
arXiv Detail & Related papers (2023-10-16T02:12:32Z)
Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data. Main aim of the identified model is to predict new data from previous observations. We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z)
Convergence Analysis of Deep Residual Networks [3.274290296343038]
Deep Residual Networks (ResNets) are of particular importance because they demonstrated great usefulness in computer vision. We aim at characterizing the convergence of deep ResNets as the depth tends to infinity in terms of the parameters of the networks.
arXiv Detail & Related papers (2022-05-13T11:53:09Z)
Tensor Methods in Computer Vision and Deep Learning [120.3881619902096]
tensors, or multidimensional arrays, are data structures that can naturally represent visual data of multiple dimensions. With the advent of the deep learning paradigm shift in computer vision, tensors have become even more fundamental. This article provides an in-depth and practical review of tensors and tensor methods in the context of representation learning and deep learning.
arXiv Detail & Related papers (2021-07-07T18:42:45Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
A Study of the Mathematics of Deep Learning [1.14219428942199]
"Deep Learning"/"Deep Neural Nets" is a technological marvel that is now increasingly deployed at the cutting-edge of artificial intelligence tasks. This thesis takes several steps towards building strong theoretical foundations for these new paradigms of deep-learning.
arXiv Detail & Related papers (2021-04-28T22:05:54Z)
Deep Polynomial Neural Networks [77.70761658507507]
$Pi$Nets are a new class of function approximators based on expansions. $Pi$Nets produce state-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
arXiv Detail & Related papers (2020-06-20T16:23:32Z)
Depth Selection for Deep ReLU Nets in Feature Extraction and Generalization [22.696129751033983]
We show that implementing the classical empirical risk minimization on deep nets can achieve the optimal generalization performance for numerous learning tasks. Our results are verified by a series of numerical experiments including toy simulations and a real application of earthquake seismic intensity prediction.
arXiv Detail & Related papers (2020-04-01T06:03:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.