Rank-adaptive spectral pruning of convolutional layers during training
- URL: http://arxiv.org/abs/2305.19059v1
- Date: Tue, 30 May 2023 14:20:51 GMT
- Title: Rank-adaptive spectral pruning of convolutional layers during training
- Authors: Emanuele Zangrando, Steffen Schotth\"ofer, Gianluca Ceruti, Jonas
Kusch, Francesco Tudisco
- Abstract summary: We propose a low-parametric training method that factorizes the convolutions into tensor Tucker format and adaptively prunes the Tucker ranks of the convolutional kernel during training.
We obtain a robust training algorithm that provably approximates the full baseline performance and guarantees loss descent.
A variety of experiments against the full model and alternative low-rank baselines are implemented, showing that the proposed method drastically reduces the training costs, while achieving high performance, comparable to or better than the full baseline, and consistently outperforms competing low-rank approaches.
- Score: 2.3488056916440856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The computing cost and memory demand of deep learning pipelines have grown
fast in recent years and thus a variety of pruning techniques have been
developed to reduce model parameters. The majority of these techniques focus on
reducing inference costs by pruning the network after a pass of full training.
A smaller number of methods address the reduction of training costs, mostly
based on compressing the network via low-rank layer factorizations. Despite
their efficiency for linear layers, these methods fail to effectively handle
convolutional filters. In this work, we propose a low-parametric training
method that factorizes the convolutions into tensor Tucker format and
adaptively prunes the Tucker ranks of the convolutional kernel during training.
Leveraging fundamental results from geometric integration theory of
differential equations on tensor manifolds, we obtain a robust training
algorithm that provably approximates the full baseline performance and
guarantees loss descent. A variety of experiments against the full model and
alternative low-rank baselines are implemented, showing that the proposed
method drastically reduces the training costs, while achieving high
performance, comparable to or better than the full baseline, and consistently
outperforms competing low-rank approaches.
Related papers
- LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive.
Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones.
We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z) - Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries [10.209740962369453]
Sparse regularization techniques are well-established in machine learning, yet their application in neural networks remains challenging.
A promising alternative is shallow weight factorization, where weights are pruning into two factors, allowing for optimization of $L$penalized neural networks.
In this work, we introduce deep weight factorization, adding differenti factors to more than two previous approaches.
arXiv Detail & Related papers (2025-02-04T17:12:56Z) - tCURLoRA: Tensor CUR Decomposition Based Low-Rank Parameter Adaptation and Its Application in Medical Image Segmentation [1.3281936946796913]
Transfer learning, by leveraging knowledge from pre-trained models, has significantly enhanced the performance of target tasks.
As deep neural networks scale up, full fine-tuning introduces substantial computational and storage challenges.
We propose tCURLoRA, a novel fine-tuning method based on tensor CUR decomposition.
arXiv Detail & Related papers (2025-01-04T08:25:32Z) - Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme [0.0]
Emergence in machine learning refers to the spontaneous appearance of capabilities that arise from the scale and structure of training data.
We introduce a novel yet straightforward neural network initialization scheme that aims at achieving greater potential for emergence.
We demonstrate substantial improvements in both model accuracy and training speed, with and without batch normalization.
arXiv Detail & Related papers (2024-07-26T18:56:47Z) - Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Concurrent Training and Layer Pruning of Deep Neural Networks [0.0]
We propose an algorithm capable of identifying and eliminating irrelevant layers of a neural network during the early stages of training.
We employ a structure using residual connections around nonlinear network sections that allow the flow of information through the network once a nonlinear section is pruned.
arXiv Detail & Related papers (2024-06-06T23:19:57Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network Architecture [0.0]
This work presents a two-stage adaptive framework for developing deep neural network (DNN) architectures that generalize well for a given training data set.
In the first stage, a layerwise training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers.
We introduce a epsilon-delta stability-promoting concept as a desirable property for a learning algorithm and show that employing manifold regularization yields a epsilon-delta stability-promoting algorithm.
arXiv Detail & Related papers (2022-11-13T09:51:16Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - Defensive Tensorization [113.96183766922393]
We propose tensor defensiveization, an adversarial defence technique that leverages a latent high-order factorization of the network.
We empirically demonstrate the effectiveness of our approach on standard image classification benchmarks.
We validate the versatility of our approach across domains and low-precision architectures by considering an audio task and binary networks.
arXiv Detail & Related papers (2021-10-26T17:00:16Z) - Initialization and Regularization of Factorized Neural Layers [23.875225732697142]
We show how to initialize and regularize Factorized layers in deep nets.
We show how these schemes lead to improved performance on both translation and unsupervised pre-training.
arXiv Detail & Related papers (2021-05-03T17:28:07Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - A Multi-Scale Tensor Network Architecture for Classification and
Regression [0.0]
We present an algorithm for supervised learning using tensor networks.
We employ a step of preprocessing the data by coarse-graining through a sequence of wavelet transformations.
We show how fine-graining through the network may be used to initialize models with access to finer-scale features.
arXiv Detail & Related papers (2020-01-22T21:26:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.