Kaleidoscope: An Efficient, Learnable Representation For All Structured
Linear Maps
- URL: http://arxiv.org/abs/2012.14966v2
- Date: Tue, 5 Jan 2021 07:29:16 GMT
- Title: Kaleidoscope: An Efficient, Learnable Representation For All Structured
Linear Maps
- Authors: Tri Dao, Nimit S. Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder,
Megan Leszczynski, Atri Rudra, Christopher R\'e
- Abstract summary: We introduce kaleidoscope matrices (K-matrices) that provably capture any structured matrix with near-optimal space.
K-matrices can be automatically learned within end-to-end pipelines to replace hand-crafted procedures.
We use K-matrices in a Transformer network to attain 36% faster end-to-end inference speed on a language translation task.
- Score: 20.151950843660973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern neural network architectures use structured linear transformations,
such as low-rank matrices, sparse matrices, permutations, and the Fourier
transform, to improve inference speed and reduce memory usage compared to
general linear maps. However, choosing which of the myriad structured
transformations to use (and its associated parameterization) is a laborious
task that requires trading off speed, space, and accuracy. We consider a
different approach: we introduce a family of matrices called kaleidoscope
matrices (K-matrices) that provably capture any structured matrix with
near-optimal space (parameter) and time (arithmetic operation) complexity. We
empirically validate that K-matrices can be automatically learned within
end-to-end pipelines to replace hand-crafted procedures, in order to improve
model quality. For example, replacing channel shuffles in ShuffleNet improves
classification accuracy on ImageNet by up to 5%. K-matrices can also simplify
hand-engineered pipelines -- we replace filter bank feature computation in
speech data preprocessing with a learnable kaleidoscope layer, resulting in
only 0.4% loss in accuracy on the TIMIT speech recognition task. In addition,
K-matrices can capture latent structure in models: for a challenging permuted
image classification task, a K-matrix based representation of permutations is
able to learn the right latent structure and improves accuracy of a downstream
convolutional model by over 9%. We provide a practically efficient
implementation of our approach, and use K-matrices in a Transformer network to
attain 36% faster end-to-end inference speed on a language translation task.
Related papers
- Variable-size Symmetry-based Graph Fourier Transforms for image compression [65.7352685872625]
We propose a new family of Symmetry-based Graph Fourier Transforms of variable sizes into a coding framework.
Our proposed algorithm generates symmetric graphs on the grid by adding specific symmetrical connections between nodes.
Experiments show that SBGFTs outperform the primary transforms integrated in the explicit Multiple Transform Selection.
arXiv Detail & Related papers (2024-11-24T13:00:44Z) - MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers [43.39466934693055]
We present MemoryFormer, a novel transformer architecture which significantly reduces the computational complexity (FLOPs) from a new perspective.
This is made possible by utilizing an alternative method for feature transformation to replace the linear projection of fully-connected layers.
We conduct extensive experiments on various benchmarks to demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2024-11-20T02:41:53Z) - Compute Better Spent: Replacing Dense Layers with Structured Matrices [77.61728033234233]
We identify more efficient alternatives to dense matrices, as exemplified by the success of convolutional networks in the image domain.
We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance.
We propose a novel matrix family containing Monarch matrices, the Block-Train, which we show performs better than dense for the same compute on multiple tasks.
arXiv Detail & Related papers (2024-06-10T13:25:43Z) - Dynamic Shuffle: An Efficient Channel Mixture Method [8.720510396996142]
We devise a dynamic shuffle module to generate data-dependent permutation matrices for shuffling.
Experiment results on image classification benchmark datasets have shown that our method significantly increases ShuffleNets' performance.
arXiv Detail & Related papers (2023-10-04T12:47:48Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Transformers as Statisticians: Provable In-Context Learning with
In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL.
We show that transformers can implement a broad class of standard machine learning algorithms in context.
A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z) - Generalized Learning Vector Quantization for Classification in
Randomized Neural Networks and Hyperdimensional Computing [4.4886210896619945]
We propose a modified RVFL network that avoids computationally expensive matrix operations during training.
The proposed approach achieved state-of-the-art accuracy on a collection of datasets from the UCI Machine Learning Repository.
arXiv Detail & Related papers (2021-06-17T21:17:17Z) - Container: Context Aggregation Network [83.12004501984043]
Recent finding shows that a simple based solution without any traditional convolutional or Transformer components can produce effective visual representations.
We present the model (CONText Ion NERtwok), a general-purpose building block for multi-head context aggregation.
In contrast to Transformer-based methods that do not scale well to downstream tasks that rely on larger input image resolutions, our efficient network, named modellight, can be employed in object detection and instance segmentation networks.
arXiv Detail & Related papers (2021-06-02T18:09:11Z) - Metalearning: Sparse Variable-Structure Automata [0.0]
We propose a metalearning approach to increase the number of basis vectors used in dynamic sparse coding vectors on the fly.
An actor-critic algorithm is deployed to automatically choose an appropriate dimension for feature regarding the required level of accuracy.
arXiv Detail & Related papers (2021-01-30T21:32:23Z) - Computational optimization of convolutional neural networks using
separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing.
Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.