Optimization Landscapes of Wide Deep Neural Networks Are Benign
        - URL: http://arxiv.org/abs/2010.00885v2
 - Date: Wed, 13 Jan 2021 10:11:53 GMT
 - Title: Optimization Landscapes of Wide Deep Neural Networks Are Benign
 - Authors: Johannes Lederer
 - Abstract summary: We show that empirical-risk minimization over such networks has no confined points, that is, suboptimal parameters that are difficult to escape from.
Our theories substantiate the common belief that wide neural networks are not only highly expressive but also comparably easy to optimize.
 - Score: 1.52292571922932
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   We analyze the optimization landscapes of deep learning with wide networks.
We highlight the importance of constraints for such networks and show that
constraint -- as well as unconstraint -- empirical-risk minimization over such
networks has no confined points, that is, suboptimal parameters that are
difficult to escape from. Hence, our theories substantiate the common belief
that wide neural networks are not only highly expressive but also comparably
easy to optimize.
 
       
      
        Related papers
        - Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement   Learning [57.3885832382455]
We show that introducing static network sparsity alone can unlock further scaling potential beyond dense counterparts with state-of-the-art architectures.<n>Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity.
arXiv  Detail & Related papers  (2025-06-20T17:54:24Z) - Error Bound Analysis for the Regularized Loss of Deep Linear Neural   Networks [4.6485895241404585]
We characterize the local geometric landscape of the regularized squared loss of deep linear networks.
We identify the sufficient and necessary conditions under error holds.
We conduct numerical experiments demonstrating that gradient exhibits linear convergence when the regularized loss of deep linear networks.
arXiv  Detail & Related papers  (2025-02-16T14:53:52Z) - Component-based Sketching for Deep ReLU Nets [55.404661149594375]
We develop a sketching scheme based on deep net components for various tasks.
We transform deep net training into a linear empirical risk minimization problem.
We show that the proposed component-based sketching provides almost optimal rates in approximating saturated functions.
arXiv  Detail & Related papers  (2024-09-21T15:30:43Z) - Restricted Bayesian Neural Network [0.0]
This study explores the concept of Bayesian Neural Networks, presenting a novel architecture designed to significantly alleviate the storage space complexity of a network.
We introduce an algorithm adept at efficiently handling uncertainties, ensuring robust convergence values without becoming trapped in local optima.
arXiv  Detail & Related papers  (2024-03-06T19:09:11Z) - No Wrong Turns: The Simple Geometry Of Neural Networks Optimization
  Paths [12.068608358926317]
First-order optimization algorithms are known to efficiently locate favorable minima in deep neural networks.
We focus on the fundamental geometric properties of sampled quantities of optimization on two key paths.
Our findings suggest that not only do optimization trajectories never encounter significant obstacles, but they also maintain stable dynamics during the majority of training.
arXiv  Detail & Related papers  (2023-06-20T22:10:40Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv  Detail & Related papers  (2023-03-16T21:06:13Z) - Singular Value Perturbation and Deep Network Optimization [29.204852309828006]
We develop new theoretical results on matrix perturbation to shed light on the impact of architecture on the performance of a deep network.
In particular, we explain what deep learning practitioners have long observed empirically: the parameters of some deep architectures are easier to optimize than others.
A direct application of our perturbation results explains analytically why a ResNet is easier to optimize than a ConvNet.
arXiv  Detail & Related papers  (2022-03-07T02:09:39Z) - Towards Understanding Theoretical Advantages of Complex-Reaction
  Networks [77.34726150561087]
We show that a class of functions can be approximated by a complex-reaction network using the number of parameters.
For empirical risk minimization, our theoretical result shows that the critical point set of complex-reaction networks is a proper subset of that of real-valued networks.
arXiv  Detail & Related papers  (2021-08-15T10:13:49Z) - DDCNet: Deep Dilated Convolutional Neural Network for Dense Prediction [0.0]
A receptive field (ERF) and a higher resolution of spatial features within a network are essential for providing higher-resolution dense estimates.
We present a systemic approach to design network architectures that can provide a larger receptive field while maintaining a higher spatial feature resolution.
arXiv  Detail & Related papers  (2021-07-09T23:15:34Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv  Detail & Related papers  (2020-08-19T04:53:31Z) - Generalization bound of globally optimal non-convex neural network
  training: Transportation map estimation by infinite dimensional Langevin
  dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv  Detail & Related papers  (2020-07-11T18:19:50Z) - Deep Adaptive Inference Networks for Single Image Super-Resolution [72.7304455761067]
Single image super-resolution (SISR) has witnessed tremendous progress in recent years owing to the deployment of deep convolutional neural networks (CNNs)
In this paper, we take a step forward to address this issue by leveraging the adaptive inference networks for deep SISR (AdaDSR)
Our AdaDSR involves an SISR model as backbone and a lightweight adapter module which takes image features and resource constraint as input and predicts a map of local network depth.
arXiv  Detail & Related papers  (2020-04-08T10:08:20Z) - A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable
  Optimization Via Overparameterization From Depth [19.866928507243617]
Training deep neural networks with gradient descent (SGD) can often achieve zero training loss on real-world landscapes.
We propose a new limit of infinity deep residual networks, which enjoys a good training in the sense that everyr is global.
arXiv  Detail & Related papers  (2020-03-11T20:14:47Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.