Robust low-rank training via approximate orthonormal constraints
- URL: http://arxiv.org/abs/2306.01485v1
- Date: Fri, 2 Jun 2023 12:22:35 GMT
- Title: Robust low-rank training via approximate orthonormal constraints
- Authors: Dayana Savostianova, Emanuele Zangrando, Gianluca Ceruti, Francesco
Tudisco
- Abstract summary: We introduce a robust low-rank training algorithm that maintains the network's weights on the low-rank matrix manifold.
The resulting model reduces both training and inference costs while ensuring well-conditioning and thus better adversarial robustness, without compromising model accuracy.
- Score: 2.519906683279153
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growth of model and data sizes, a broad effort has been made to
design pruning techniques that reduce the resource demand of deep learning
pipelines, while retaining model performance. In order to reduce both inference
and training costs, a prominent line of work uses low-rank matrix
factorizations to represent the network weights. Although able to retain
accuracy, we observe that low-rank methods tend to compromise model robustness
against adversarial perturbations. By modeling robustness in terms of the
condition number of the neural network, we argue that this loss of robustness
is due to the exploding singular values of the low-rank weight matrices. Thus,
we introduce a robust low-rank training algorithm that maintains the network's
weights on the low-rank matrix manifold while simultaneously enforcing
approximate orthonormal constraints. The resulting model reduces both training
and inference costs while ensuring well-conditioning and thus better
adversarial robustness, without compromising model accuracy. This is shown by
extensive numerical evidence and by our main approximation theorem that shows
the computed robust low-rank network well-approximates the ideal full model,
provided a highly performing low-rank sub-network exists.
Related papers
- Soft Merging: A Flexible and Robust Soft Model Merging Approach for
Enhanced Neural Network Performance [6.599368083393398]
Gradient (SGD) is often limited to converging local optima to improve model performance.
em soft merging method minimizes the obtained local optima models in undesirable results.
Experiments underscore the effectiveness of the merged networks.
arXiv Detail & Related papers (2023-09-21T17:07:31Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Compression-aware Training of Neural Networks using Frank-Wolfe [27.69586583737247]
We propose a framework that encourages convergence to well-performing solutions while inducing robustness towards filter pruning and low-rank matrix decomposition.
Our method is able to outperform existing compression-aware approaches and, in the case of low-rank matrix decomposition, it also requires significantly less computational resources than approaches based on nuclear-norm regularization.
arXiv Detail & Related papers (2022-05-24T09:29:02Z) - Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for
sparse recover [87.28082715343896]
We consider deep neural networks for solving inverse problems that are robust to forward model mis-specifications.
We design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem.
The proposed REST network is shown to outperform state-of-the-art model-based and data-driven algorithms in both compressive sensing and radar imaging problems.
arXiv Detail & Related papers (2021-10-20T06:15:45Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Revisit Geophysical Imaging in A New View of Physics-informed Generative
Adversarial Learning [2.12121796606941]
Full waveform inversion produces high-resolution subsurface models.
FWI with least-squares function suffers from many drawbacks such as the local-minima problem.
Recent works relying on partial differential equations and neural networks show promising performance for two-dimensional FWI.
We propose an unsupervised learning paradigm that integrates wave equation with a discriminate network to accurately estimate the physically consistent models.
arXiv Detail & Related papers (2021-09-23T15:54:40Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Non-Singular Adversarial Robustness of Neural Networks [58.731070632586594]
Adrial robustness has become an emerging challenge for neural network owing to its over-sensitivity to small input perturbations.
We formalize the notion of non-singular adversarial robustness for neural networks through the lens of joint perturbations to data inputs as well as model weights.
arXiv Detail & Related papers (2021-02-23T20:59:30Z) - Achieving Adversarial Robustness via Sparsity [33.11581532788394]
We prove that the sparsity of network weights is closely associated with model robustness.
We propose a novel adversarial training method called inverse weights inheritance.
arXiv Detail & Related papers (2020-09-11T13:15:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.