Related papers: Compression-aware Training of Neural Networks using Frank-Wolfe

Compression-aware Training of Neural Networks using Frank-Wolfe

URL: http://arxiv.org/abs/2205.11921v2
Date: Wed, 14 Feb 2024 16:43:50 GMT
Title: Compression-aware Training of Neural Networks using Frank-Wolfe
Authors: Max Zimmer and Christoph Spiegel and Sebastian Pokutta
Abstract summary: We propose a framework that encourages convergence to well-performing solutions while inducing robustness towards filter pruning and low-rank matrix decomposition. Our method is able to outperform existing compression-aware approaches and, in the case of low-rank matrix decomposition, it also requires significantly less computational resources than approaches based on nuclear-norm regularization.
Score: 27.69586583737247
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many existing Neural Network pruning approaches rely on either retraining or inducing a strong bias in order to converge to a sparse solution throughout training. A third paradigm, 'compression-aware' training, aims to obtain state-of-the-art dense models that are robust to a wide range of compression ratios using a single dense training run while also avoiding retraining. We propose a framework centered around a versatile family of norm constraints and the Stochastic Frank-Wolfe (SFW) algorithm that encourage convergence to well-performing solutions while inducing robustness towards convolutional filter pruning and low-rank matrix decomposition. Our method is able to outperform existing compression-aware approaches and, in the case of low-rank matrix decomposition, it also requires significantly less computational resources than approaches based on nuclear-norm regularization. Our findings indicate that dynamically adjusting the learning rate of SFW, as suggested by Pokutta et al. (2020), is crucial for convergence and robustness of SFW-trained models and we establish a theoretical foundation for that practice.

Related papers

Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition [11.399520888150468]
We present a theoretically-justified technique termed Low-Rank Induced Training (LoRITa) LoRITa promotes low-rankness through the composition of linear layers and compresses by using singular value truncation. We demonstrate the effectiveness of our approach using MNIST on Fully Connected Networks, CIFAR10 on Vision Transformers, and CIFAR10/100 and ImageNet on Convolutional Neural Networks.
arXiv Detail & Related papers (2024-05-06T00:58:23Z)
Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network. We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems. We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z)
The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes. In this paper we examine the use of convex neural recovery models. We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program. We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z)
Outlier-robust neural network training: variation regularization meets trimmed loss to prevent functional breakdown [2.5628953713168685]
We tackle the challenge of outlier-robust predictive modeling using highly expressive neural networks.<n>Our approach integrates two key components: (1) a transformed trimmed loss (TTL), and (2) higher-order variation regularization (HOVR), which imposes smoothness constraints on the prediction function.
arXiv Detail & Related papers (2023-08-04T12:57:13Z)
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks. We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z)
Riemannian Low-Rank Model Compression for Federated Learning with Over-the-Air Aggregation [2.741266294612776]
Low-rank model compression is a widely used technique for reducing the computational load when training machine learning models. Existing compression techniques are not directly applicable to efficient over-the-air (OTA) aggregation in federated learning systems. We propose a novel manifold optimization formulation for low-rank model compression in FL that does not relax the low-rank constraint.
arXiv Detail & Related papers (2023-06-04T18:32:50Z)
Robust low-rank training via approximate orthonormal constraints [2.519906683279153]
We introduce a robust low-rank training algorithm that maintains the network's weights on the low-rank matrix manifold. The resulting model reduces both training and inference costs while ensuring well-conditioning and thus better adversarial robustness, without compromising model accuracy.
arXiv Detail & Related papers (2023-06-02T12:22:35Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
Learning Robust Kernel Ensembles with Kernel Average Pooling [3.6540368812166872]
We introduce Kernel Average Pooling (KAP), a neural network building block that applies the mean filter along the kernel dimension of the layer activation tensor. We show that ensembles of kernels with similar functionality naturally emerge in convolutional neural networks equipped with KAP and trained with backpropagation.
arXiv Detail & Related papers (2022-09-30T19:49:14Z)
Distributed Adversarial Training to Robustify Deep Neural Networks at Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training. We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z)
A Tunable Robust Pruning Framework Through Dynamic Network Rewiring of DNNs [8.597091257152567]
We present a dynamic network rewiring (DNR) method to generate pruned deep neural network (DNN) models that are robust against adversarial attacks. Our experiments show that DNR consistently finds compressed models with better clean and adversarial image classification performance than what is achievable through state-of-the-art alternatives.
arXiv Detail & Related papers (2020-11-03T19:49:00Z)
An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d) This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.