Related papers: Singular Value Perturbation and Deep Network Optimization

Singular Value Perturbation and Deep Network Optimization

URL: http://arxiv.org/abs/2203.03099v1
Date: Mon, 7 Mar 2022 02:09:39 GMT
Title: Singular Value Perturbation and Deep Network Optimization
Authors: Rudolf H. Riedi, Randall Balestriero, Richard G. Baraniuk
Abstract summary: We develop new theoretical results on matrix perturbation to shed light on the impact of architecture on the performance of a deep network. In particular, we explain what deep learning practitioners have long observed empirically: the parameters of some deep architectures are easier to optimize than others. A direct application of our perturbation results explains analytically why a ResNet is easier to optimize than a ConvNet.
Score: 29.204852309828006
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We develop new theoretical results on matrix perturbation to shed light on the impact of architecture on the performance of a deep network. In particular, we explain analytically what deep learning practitioners have long observed empirically: the parameters of some deep architectures (e.g., residual networks, ResNets, and Dense networks, DenseNets) are easier to optimize than others (e.g., convolutional networks, ConvNets). Building on our earlier work connecting deep networks with continuous piecewise-affine splines, we develop an exact local linear representation of a deep network layer for a family of modern deep networks that includes ConvNets at one end of a spectrum and ResNets and DenseNets at the other. For regression tasks that optimize the squared-error loss, we show that the optimization loss surface of a modern deep network is piecewise quadratic in the parameters, with local shape governed by the singular values of a matrix that is a function of the local linear representation. We develop new perturbation results for how the singular values of matrices of this sort behave as we add a fraction of the identity and multiply by certain diagonal matrices. A direct application of our perturbation results explains analytically why a ResNet is easier to optimize than a ConvNet: thanks to its more stable singular values and smaller condition number, the local loss surface of a ResNet or DenseNet is less erratic, less eccentric, and features local minima that are more accommodating to gradient-based optimization. Our results also shed new light on the impact of different nonlinear activation functions on a deep network's singular values, regardless of its architecture.

Related papers

Error Bound Analysis for the Regularized Loss of Deep Linear Neural Networks [4.6485895241404585]
We characterize the local geometric landscape of the regularized squared loss of deep linear networks. We identify the sufficient and necessary conditions under error holds. We conduct numerical experiments demonstrating that gradient exhibits linear convergence when the regularized loss of deep linear networks.
arXiv Detail & Related papers (2025-02-16T14:53:52Z)
Preserving Deep Representations In One-Shot Pruning: A Hessian-Free Second-Order Optimization Framework [12.331056472174275]
We present SNOWS, a one-shot post-training pruning framework aimed at reducing the cost of vision network inference without retraining. A key innovation of our framework is the use of Hessian-free optimization to compute exact Newton descent steps without needing to compute or store the full Hessian matrix.
arXiv Detail & Related papers (2024-11-27T14:25:00Z)
Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks. Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z)
PirateNets: Physics-informed Deep Learning with Residual Adaptive Networks [19.519831541375144]
We introduce Physics-informed Residual Adaptive Networks (PirateNets) to facilitate stable and efficient training of deep PINN models. PirateNets leverage a novel adaptive residual connection, which allows the networks to be as shallow networks that progressively deepen during training. We show that PirateNets are easier to optimize and can gain accuracy from considerably increased depth, ultimately achieving state-of-the-art results across various benchmarks.
arXiv Detail & Related papers (2024-02-01T04:17:56Z)
Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs. We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z)
A Generalization of Continuous Relaxation in Structured Pruning [0.3277163122167434]
Trends indicate that deeper and larger neural networks with an increasing number of parameters achieve higher accuracy than smaller neural networks. We generalize structured pruning with algorithms for network augmentation, pruning, sub-network collapse and removal. The resulting CNN executes efficiently on GPU hardware without computationally expensive sparse matrix operations.
arXiv Detail & Related papers (2023-08-28T14:19:13Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
DreamNet: A Deep Riemannian Network based on SPD Manifold Learning for Visual Classification [36.848148506610364]
We propose a new architecture for SPD matrix learning. To enrich the deep representations, we adopt SPDNet as the backbone. We then insert several residual-like blocks with shortcut connections to augment the representational capacity of SRAE.
arXiv Detail & Related papers (2022-06-16T07:15:20Z)
Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient [62.660451283548724]
ResiNet is a reinforcement learning framework to discover resilient network topologies against various disasters and attacks. We show that ResiNet achieves a near-optimal resilience gain on multiple graphs while balancing the utility, with a large margin compared to existing approaches.
arXiv Detail & Related papers (2021-10-18T06:14:28Z)
ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction [32.489371527159236]
This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation. We show that for high-dimensional multi-class data, the optimal linear discriminative representation maximizes the coding rate difference between the whole dataset and the average of all the subsets. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction objective naturally leads to a multi-layer deep network, named ReduNet, that shares common characteristics of modern deep networks.
arXiv Detail & Related papers (2021-05-21T16:29:57Z)
Kernel-Based Smoothness Analysis of Residual Networks [85.20737467304994]
Residual networks (ResNets) stand out among these powerful modern architectures. In this paper, we show another distinction between the two models, namely, a tendency of ResNets to promote smoothers than gradients.
arXiv Detail & Related papers (2020-09-21T16:32:04Z)
Eigendecomposition-Free Training of Deep Networks for Linear Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network. We show that our approach is much more robust than explicit differentiation of the eigendecomposition. Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z)
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth [19.866928507243617]
Training deep neural networks with gradient descent (SGD) can often achieve zero training loss on real-world landscapes. We propose a new limit of infinity deep residual networks, which enjoys a good training in the sense that everyr is global.
arXiv Detail & Related papers (2020-03-11T20:14:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.