Differentiable Learning of Generalized Structured Matrices for Efficient
Deep Neural Networks
- URL: http://arxiv.org/abs/2310.18882v2
- Date: Fri, 8 Mar 2024 02:13:00 GMT
- Title: Differentiable Learning of Generalized Structured Matrices for Efficient
Deep Neural Networks
- Authors: Changwoo Lee, Hun-Seok Kim
- Abstract summary: This paper investigates efficient deep neural networks (DNNs) to replace dense unstructured weight matrices with structured ones that possess desired properties.
The challenge arises because the optimal weight matrix structure in popular neural network models is obscure in most cases and may vary from layer to layer even in the same network.
We propose a generalized and differentiable framework to learn efficient structures of weight matrices by gradient descent.
- Score: 16.546708806547137
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates efficient deep neural networks (DNNs) to replace
dense unstructured weight matrices with structured ones that possess desired
properties. The challenge arises because the optimal weight matrix structure in
popular neural network models is obscure in most cases and may vary from layer
to layer even in the same network. Prior structured matrices proposed for
efficient DNNs were mostly hand-crafted without a generalized framework to
systematically learn them. To address this issue, we propose a generalized and
differentiable framework to learn efficient structures of weight matrices by
gradient descent. We first define a new class of structured matrices that
covers a wide range of structured matrices in the literature by adjusting the
structural parameters. Then, the frequency-domain differentiable
parameterization scheme based on the Gaussian-Dirichlet kernel is adopted to
learn the structural parameters by proximal gradient descent. On the image and
language tasks, our method learns efficient DNNs with structured matrices,
achieving lower complexity and/or higher performance than prior approaches that
employ low-rank, block-sparse, or block-low-rank matrices.
Related papers
- BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference [15.519068157865023]
We introduce the Block-Level Adaptive STructured (BLAST) matrix to learn and leverage efficient structures prevalent in the weight matrices of linear layers within deep learning models.
We demonstrate the efficiency of using the matrix for compressing both language and vision tasks.
arXiv Detail & Related papers (2024-10-28T17:56:18Z) - Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices [88.33936714942996]
We present a unifying framework that enables searching among all linear operators expressible via an Einstein summation.
We show that differences in the compute-optimal scaling laws are mostly governed by a small number of variables.
We find that Mixture-of-Experts (MoE) learns an MoE in every single linear layer of the model, including the projection in the attention blocks.
arXiv Detail & Related papers (2024-10-03T00:44:50Z) - Group and Shuffle: Efficient Structured Orthogonal Parametrization [3.540195249269228]
We introduce a new class of structured matrices, which unifies and generalizes structured classes from previous works.
We empirically validate our method on different domains, including adapting of text-to-image diffusion models and downstream task fine-tuning in language modeling.
arXiv Detail & Related papers (2024-06-14T13:29:36Z) - Compute Better Spent: Replacing Dense Layers with Structured Matrices [77.61728033234233]
We identify more efficient alternatives to dense matrices, as exemplified by the success of convolutional networks in the image domain.
We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance.
We propose a novel matrix family containing Monarch matrices, the Block-Train, which we show performs better than dense for the same compute on multiple tasks.
arXiv Detail & Related papers (2024-06-10T13:25:43Z) - A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - A Structured Sparse Neural Network and Its Matrix Calculations Algorithm [0.0]
We introduce a nonsymmetric, tridiagonal matrix with offdiagonal sparse entries and offset sub and super-diagonals.
For the cases where the matrix inverse does not exist, a least square type pseudoinverse is provided.
Results show significant improvement in computational costs specially when the size of matrix increases.
arXiv Detail & Related papers (2022-07-02T19:38:48Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Block-encoding based quantum algorithm for linear systems with
displacement structures [4.145426157018113]
We present efficient and memory-reduced quantum algorithms for solving linear systems with displacement structures.
The proposed block-encodings provide a quadratic speedup with respect to the dimension over classical algorithms.
One of the quantum linear system solvers is applied to the linear prediction of time series.
arXiv Detail & Related papers (2019-12-27T16:10:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.