Related papers: Optimization Landscapes of Wide Deep Neural Networks Are Benign

Related papers

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning [57.3885832382455]
We show that introducing static network sparsity alone can unlock further scaling potential beyond dense counterparts with state-of-the-art architectures.<n>Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity.
arXiv Detail & Related papers (2025-06-20T17:54:24Z)
Error Bound Analysis for the Regularized Loss of Deep Linear Neural Networks [4.6485895241404585]
We characterize the local geometric landscape of the regularized squared loss of deep linear networks. We identify the sufficient and necessary conditions under error holds. We conduct numerical experiments demonstrating that gradient exhibits linear convergence when the regularized loss of deep linear networks.
arXiv Detail & Related papers (2025-02-16T14:53:52Z)
Component-based Sketching for Deep ReLU Nets [55.404661149594375]
We develop a sketching scheme based on deep net components for various tasks. We transform deep net training into a linear empirical risk minimization problem. We show that the proposed component-based sketching provides almost optimal rates in approximating saturated functions.
arXiv Detail & Related papers (2024-09-21T15:30:43Z)
Restricted Bayesian Neural Network [0.0]
This study explores the concept of Bayesian Neural Networks, presenting a novel architecture designed to significantly alleviate the storage space complexity of a network. We introduce an algorithm adept at efficiently handling uncertainties, ensuring robust convergence values without becoming trapped in local optima.
arXiv Detail & Related papers (2024-03-06T19:09:11Z)
No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths [12.068608358926317]
First-order optimization algorithms are known to efficiently locate favorable minima in deep neural networks. We focus on the fundamental geometric properties of sampled quantities of optimization on two key paths. Our findings suggest that not only do optimization trajectories never encounter significant obstacles, but they also maintain stable dynamics during the majority of training.
arXiv Detail & Related papers (2023-06-20T22:10:40Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Singular Value Perturbation and Deep Network Optimization [29.204852309828006]
We develop new theoretical results on matrix perturbation to shed light on the impact of architecture on the performance of a deep network. In particular, we explain what deep learning practitioners have long observed empirically: the parameters of some deep architectures are easier to optimize than others. A direct application of our perturbation results explains analytically why a ResNet is easier to optimize than a ConvNet.
arXiv Detail & Related papers (2022-03-07T02:09:39Z)
Towards Understanding Theoretical Advantages of Complex-Reaction Networks [77.34726150561087]
We show that a class of functions can be approximated by a complex-reaction network using the number of parameters. For empirical risk minimization, our theoretical result shows that the critical point set of complex-reaction networks is a proper subset of that of real-valued networks.
arXiv Detail & Related papers (2021-08-15T10:13:49Z)
DDCNet: Deep Dilated Convolutional Neural Network for Dense Prediction [0.0]
A receptive field (ERF) and a higher resolution of spatial features within a network are essential for providing higher-resolution dense estimates. We present a systemic approach to design network architectures that can provide a larger receptive field while maintaining a higher spatial feature resolution.
arXiv Detail & Related papers (2021-07-09T23:15:34Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error. Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z)
Deep Adaptive Inference Networks for Single Image Super-Resolution [72.7304455761067]
Single image super-resolution (SISR) has witnessed tremendous progress in recent years owing to the deployment of deep convolutional neural networks (CNNs) In this paper, we take a step forward to address this issue by leveraging the adaptive inference networks for deep SISR (AdaDSR) Our AdaDSR involves an SISR model as backbone and a lightweight adapter module which takes image features and resource constraint as input and predicts a map of local network depth.
arXiv Detail & Related papers (2020-04-08T10:08:20Z)
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth [19.866928507243617]
Training deep neural networks with gradient descent (SGD) can often achieve zero training loss on real-world landscapes. We propose a new limit of infinity deep residual networks, which enjoys a good training in the sense that everyr is global.
arXiv Detail & Related papers (2020-03-11T20:14:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.