Singular Value Perturbation and Deep Network Optimization
- URL: http://arxiv.org/abs/2203.03099v1
- Date: Mon, 7 Mar 2022 02:09:39 GMT
- Title: Singular Value Perturbation and Deep Network Optimization
- Authors: Rudolf H. Riedi, Randall Balestriero, Richard G. Baraniuk
- Abstract summary: We develop new theoretical results on matrix perturbation to shed light on the impact of architecture on the performance of a deep network.
In particular, we explain what deep learning practitioners have long observed empirically: the parameters of some deep architectures are easier to optimize than others.
A direct application of our perturbation results explains analytically why a ResNet is easier to optimize than a ConvNet.
- Score: 29.204852309828006
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We develop new theoretical results on matrix perturbation to shed light on
the impact of architecture on the performance of a deep network. In particular,
we explain analytically what deep learning practitioners have long observed
empirically: the parameters of some deep architectures (e.g., residual
networks, ResNets, and Dense networks, DenseNets) are easier to optimize than
others (e.g., convolutional networks, ConvNets). Building on our earlier work
connecting deep networks with continuous piecewise-affine splines, we develop
an exact local linear representation of a deep network layer for a family of
modern deep networks that includes ConvNets at one end of a spectrum and
ResNets and DenseNets at the other. For regression tasks that optimize the
squared-error loss, we show that the optimization loss surface of a modern deep
network is piecewise quadratic in the parameters, with local shape governed by
the singular values of a matrix that is a function of the local linear
representation. We develop new perturbation results for how the singular values
of matrices of this sort behave as we add a fraction of the identity and
multiply by certain diagonal matrices. A direct application of our perturbation
results explains analytically why a ResNet is easier to optimize than a
ConvNet: thanks to its more stable singular values and smaller condition
number, the local loss surface of a ResNet or DenseNet is less erratic, less
eccentric, and features local minima that are more accommodating to
gradient-based optimization. Our results also shed new light on the impact of
different nonlinear activation functions on a deep network's singular values,
regardless of its architecture.
Related papers
- Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - PirateNets: Physics-informed Deep Learning with Residual Adaptive
Networks [19.519831541375144]
We introduce Physics-informed Residual Adaptive Networks (PirateNets) to facilitate stable and efficient training of deep PINN models.
PirateNets leverage a novel adaptive residual connection, which allows the networks to be as shallow networks that progressively deepen during training.
We show that PirateNets are easier to optimize and can gain accuracy from considerably increased depth, ultimately achieving state-of-the-art results across various benchmarks.
arXiv Detail & Related papers (2024-02-01T04:17:56Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - A Generalization of Continuous Relaxation in Structured Pruning [0.3277163122167434]
Trends indicate that deeper and larger neural networks with an increasing number of parameters achieve higher accuracy than smaller neural networks.
We generalize structured pruning with algorithms for network augmentation, pruning, sub-network collapse and removal.
The resulting CNN executes efficiently on GPU hardware without computationally expensive sparse matrix operations.
arXiv Detail & Related papers (2023-08-28T14:19:13Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - DreamNet: A Deep Riemannian Network based on SPD Manifold Learning for
Visual Classification [36.848148506610364]
We propose a new architecture for SPD matrix learning.
To enrich the deep representations, we adopt SPDNet as the backbone.
We then insert several residual-like blocks with shortcut connections to augment the representational capacity of SRAE.
arXiv Detail & Related papers (2022-06-16T07:15:20Z) - Edge Rewiring Goes Neural: Boosting Network Resilience via Policy
Gradient [62.660451283548724]
ResiNet is a reinforcement learning framework to discover resilient network topologies against various disasters and attacks.
We show that ResiNet achieves a near-optimal resilience gain on multiple graphs while balancing the utility, with a large margin compared to existing approaches.
arXiv Detail & Related papers (2021-10-18T06:14:28Z) - ReduNet: A White-box Deep Network from the Principle of Maximizing Rate
Reduction [32.489371527159236]
This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation.
We show that for high-dimensional multi-class data, the optimal linear discriminative representation maximizes the coding rate difference between the whole dataset and the average of all the subsets.
We show that the basic iterative gradient ascent scheme for optimizing the rate reduction objective naturally leads to a multi-layer deep network, named ReduNet, that shares common characteristics of modern deep networks.
arXiv Detail & Related papers (2021-05-21T16:29:57Z) - Kernel-Based Smoothness Analysis of Residual Networks [85.20737467304994]
Residual networks (ResNets) stand out among these powerful modern architectures.
In this paper, we show another distinction between the two models, namely, a tendency of ResNets to promote smoothers than gradients.
arXiv Detail & Related papers (2020-09-21T16:32:04Z) - Eigendecomposition-Free Training of Deep Networks for Linear
Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network.
We show that our approach is much more robust than explicit differentiation of the eigendecomposition.
Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z) - A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable
Optimization Via Overparameterization From Depth [19.866928507243617]
Training deep neural networks with gradient descent (SGD) can often achieve zero training loss on real-world landscapes.
We propose a new limit of infinity deep residual networks, which enjoys a good training in the sense that everyr is global.
arXiv Detail & Related papers (2020-03-11T20:14:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.