Group and Shuffle: Efficient Structured Orthogonal Parametrization
- URL: http://arxiv.org/abs/2406.10019v1
- Date: Fri, 14 Jun 2024 13:29:36 GMT
- Title: Group and Shuffle: Efficient Structured Orthogonal Parametrization
- Authors: Mikhail Gorbunov, Nikolay Yudin, Vera Soboleva, Aibek Alanov, Alexey Naumov, Maxim Rakhuba,
- Abstract summary: We introduce a new class of structured matrices, which unifies and generalizes structured classes from previous works.
We empirically validate our method on different domains, including adapting of text-to-image diffusion models and downstream task fine-tuning in language modeling.
- Score: 3.540195249269228
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing size of neural networks has led to a growing demand for methods of efficient fine-tuning. Recently, an orthogonal fine-tuning paradigm was introduced that uses orthogonal matrices for adapting the weights of a pretrained model. In this paper, we introduce a new class of structured matrices, which unifies and generalizes structured classes from previous works. We examine properties of this class and build a structured orthogonal parametrization upon it. We then use this parametrization to modify the orthogonal fine-tuning framework, improving parameter and computational efficiency. We empirically validate our method on different domains, including adapting of text-to-image diffusion models and downstream task fine-tuning in language modeling. Additionally, we adapt our construction for orthogonal convolutions and conduct experiments with 1-Lipschitz neural networks.
Related papers
- Linear Chain Transformation: Expanding Optimization Dynamics for Fine-Tuning Large Language Models [11.314144876785823]
Linear Chain Transformation (LinChain) is a novel approach that introduces a sequence of linear transformations during fine-tuning to enrich optimization dynamics.
By incorporating multiple linear transformations into the parameter update process, LinChain expands the effective rank of updates and enhances the model's ability to learn complex task-specific representations.
Our experiments on various benchmark tasks show that LinChain leads to better generalization, fewer learnable parameters, and improved task adaptation.
arXiv Detail & Related papers (2024-10-29T14:07:24Z) - Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - Differentiable Learning of Generalized Structured Matrices for Efficient
Deep Neural Networks [16.546708806547137]
This paper investigates efficient deep neural networks (DNNs) to replace dense unstructured weight matrices with structured ones that possess desired properties.
The challenge arises because the optimal weight matrix structure in popular neural network models is obscure in most cases and may vary from layer to layer even in the same network.
We propose a generalized and differentiable framework to learn efficient structures of weight matrices by gradient descent.
arXiv Detail & Related papers (2023-10-29T03:07:30Z) - Orthogonal Transforms in Neural Networks Amount to Effective
Regularization [0.0]
We consider applications of neural networks in nonlinear system identification.
We show that such a structure is a universal approximator.
We empirically show in particular, that such a structure, using the Fourier transform, outperforms equivalent models without orthogonality support.
arXiv Detail & Related papers (2023-05-10T17:52:33Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - Tensor-based Sequential Learning via Hankel Matrix Representation for
Next Item Recommendations [0.0]
Self-attentive transformer models have been shown to solve the next item recommendation task very efficiently.
Motivated by the special structure of learned parameter space, we question if it is possible to mimic it with an alternative and more lightweight approach.
We develop a new tensor factorization-based model that ingrains the structural knowledge about sequential data within the learning process.
arXiv Detail & Related papers (2022-12-12T05:55:40Z) - Re-parameterizing Your Optimizers rather than Architectures [119.08740698936633]
We propose a novel paradigm of incorporating model-specific prior knowledge into Structurals and using them to train generic (simple) models.
As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper- parameters.
For a simple model trained with a Repr, we focus on a VGG-style plain model and showcase that such a simple model trained with a Repr, which is referred to as Rep-VGG, performs on par with the recent well-designed models.
arXiv Detail & Related papers (2022-05-30T16:55:59Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Neural Subdivision [58.97214948753937]
This paper introduces Neural Subdivision, a novel framework for data-driven coarseto-fine geometry modeling.
We optimize for the same set of network weights across all local mesh patches, thus providing an architecture that is not constrained to a specific input mesh, fixed genus, or category.
We demonstrate that even when trained on a single high-resolution mesh our method generates reasonable subdivisions for novel shapes.
arXiv Detail & Related papers (2020-05-04T20:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.