Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries
- URL: http://arxiv.org/abs/2502.02496v2
- Date: Fri, 07 Feb 2025 11:53:07 GMT
- Title: Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries
- Authors: Chris Kolb, Tobias Weber, Bernd Bischl, David RĂ¼gamer,
- Abstract summary: Sparse regularization techniques are well-established in machine learning, yet their application in neural networks remains challenging.
A promising alternative is shallow weight factorization, where weights are pruning into two factors, allowing for optimization of $L$penalized neural networks.
In this work, we introduce deep weight factorization, adding differenti factors to more than two previous approaches.
- Score: 10.209740962369453
- License:
- Abstract: Sparse regularization techniques are well-established in machine learning, yet their application in neural networks remains challenging due to the non-differentiability of penalties like the $L_1$ norm, which is incompatible with stochastic gradient descent. A promising alternative is shallow weight factorization, where weights are decomposed into two factors, allowing for smooth optimization of $L_1$-penalized neural networks by adding differentiable $L_2$ regularization to the factors. In this work, we introduce deep weight factorization, extending previous shallow approaches to more than two factors. We theoretically establish equivalence of our deep factorization with non-convex sparse regularization and analyze its impact on training dynamics and optimization. Due to the limitations posed by standard training practices, we propose a tailored initialization scheme and identify important learning rate requirements necessary for training factorized networks. We demonstrate the effectiveness of our deep weight factorization through experiments on various architectures and datasets, consistently outperforming its shallow counterpart and widely used pruning methods.
Related papers
- Component-based Sketching for Deep ReLU Nets [55.404661149594375]
We develop a sketching scheme based on deep net components for various tasks.
We transform deep net training into a linear empirical risk minimization problem.
We show that the proposed component-based sketching provides almost optimal rates in approximating saturated functions.
arXiv Detail & Related papers (2024-09-21T15:30:43Z) - Quantization Aware Factorization for Deep Neural Network Compression [20.04951101799232]
decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks.
A conventional post-training quantization approach applied to networks with weights yields a drop in accuracy.
This motivated us to develop an algorithm that finds decomposed approximation directly with quantized factors.
arXiv Detail & Related papers (2023-08-08T21:38:02Z) - Geometry-aware training of factorized layers in tensor Tucker format [6.701651480567394]
We introduce a novel approach to train the factors of a Tucker decomposition of the weight tensors.
Our training proposal proves to be optimal in locally approximating the original unfactorized dynamics.
We provide a theoretical analysis of the algorithm, showing convergence, approximation and local descent guarantees.
arXiv Detail & Related papers (2023-05-30T14:20:51Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - Initialization and Regularization of Factorized Neural Layers [23.875225732697142]
We show how to initialize and regularize Factorized layers in deep nets.
We show how these schemes lead to improved performance on both translation and unsupervised pre-training.
arXiv Detail & Related papers (2021-05-03T17:28:07Z) - Where Should We Begin? A Low-Level Exploration of Weight Initialization
Impact on Quantized Behaviour of Deep Neural Networks [93.4221402881609]
We present an in-depth, fine-grained ablation study of the effect of different weights initialization on the final distributions of weights and activations of different CNN architectures.
To our best knowledge, we are the first to perform such a low-level, in-depth quantitative analysis of weights initialization and its effect on quantized behaviour.
arXiv Detail & Related papers (2020-11-30T06:54:28Z) - Efficient and Sparse Neural Networks by Pruning Weights in a
Multiobjective Learning Approach [0.0]
We propose a multiobjective perspective on the training of neural networks by treating its prediction accuracy and the network complexity as two individual objective functions.
Preliminary numerical results on exemplary convolutional neural networks confirm that large reductions in the complexity of neural networks with neglibile loss of accuracy are possible.
arXiv Detail & Related papers (2020-08-31T13:28:03Z) - Reintroducing Straight-Through Estimators as Principled Methods for
Stochastic Binary Networks [85.94999581306827]
Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights.
Many successful experimental results have been achieved with empirical straight-through (ST) approaches.
At the same time, ST methods can be truly derived as estimators in the binary network (SBN) model with Bernoulli weights.
arXiv Detail & Related papers (2020-06-11T23:58:18Z) - Continual Learning using a Bayesian Nonparametric Dictionary of Weight
Factors [75.58555462743585]
Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings.
We propose a principled nonparametric approach based on the Indian Buffet Process (IBP) prior, letting the data determine how much to expand the model complexity.
We demonstrate the effectiveness of our method on a number of continual learning benchmarks and analyze how weight factors are allocated and reused throughout the training.
arXiv Detail & Related papers (2020-04-21T15:20:19Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.