Related papers: Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps

Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps

URL: http://arxiv.org/abs/2012.14966v2
Date: Tue, 5 Jan 2021 07:29:16 GMT
Title: Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
Authors: Tri Dao, Nimit S. Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, Christopher R\'e
Abstract summary: We introduce kaleidoscope matrices (K-matrices) that provably capture any structured matrix with near-optimal space. K-matrices can be automatically learned within end-to-end pipelines to replace hand-crafted procedures. We use K-matrices in a Transformer network to attain 36% faster end-to-end inference speed on a language translation task.
Score: 20.151950843660973
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern neural network architectures use structured linear transformations, such as low-rank matrices, sparse matrices, permutations, and the Fourier transform, to improve inference speed and reduce memory usage compared to general linear maps. However, choosing which of the myriad structured transformations to use (and its associated parameterization) is a laborious task that requires trading off speed, space, and accuracy. We consider a different approach: we introduce a family of matrices called kaleidoscope matrices (K-matrices) that provably capture any structured matrix with near-optimal space (parameter) and time (arithmetic operation) complexity. We empirically validate that K-matrices can be automatically learned within end-to-end pipelines to replace hand-crafted procedures, in order to improve model quality. For example, replacing channel shuffles in ShuffleNet improves classification accuracy on ImageNet by up to 5%. K-matrices can also simplify hand-engineered pipelines -- we replace filter bank feature computation in speech data preprocessing with a learnable kaleidoscope layer, resulting in only 0.4% loss in accuracy on the TIMIT speech recognition task. In addition, K-matrices can capture latent structure in models: for a challenging permuted image classification task, a K-matrix based representation of permutations is able to learn the right latent structure and improves accuracy of a downstream convolutional model by over 9%. We provide a practically efficient implementation of our approach, and use K-matrices in a Transformer network to attain 36% faster end-to-end inference speed on a language translation task.

Related papers

Kolmogorov GAM Networks are all you need! [0.6906005491572398]
Kolmogorov GAM networks are shown to be an efficient architecture for training and inference. They are an additive model with an embedding that is independent of the function of interest.
arXiv Detail & Related papers (2025-01-01T02:46:00Z)
Variable-size Symmetry-based Graph Fourier Transforms for image compression [65.7352685872625]
We propose a new family of Symmetry-based Graph Fourier Transforms of variable sizes into a coding framework. Our proposed algorithm generates symmetric graphs on the grid by adding specific symmetrical connections between nodes. Experiments show that SBGFTs outperform the primary transforms integrated in the explicit Multiple Transform Selection.
arXiv Detail & Related papers (2024-11-24T13:00:44Z)
MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers [43.39466934693055]
We present MemoryFormer, a novel transformer architecture which significantly reduces the computational complexity (FLOPs) from a new perspective. This is made possible by utilizing an alternative method for feature transformation to replace the linear projection of fully-connected layers. We conduct extensive experiments on various benchmarks to demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2024-11-20T02:41:53Z)
Compute Better Spent: Replacing Dense Layers with Structured Matrices [77.61728033234233]
We identify more efficient alternatives to dense matrices, as exemplified by the success of convolutional networks in the image domain. We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance. We propose a novel matrix family containing Monarch matrices, the Block-Train, which we show performs better than dense for the same compute on multiple tasks.
arXiv Detail & Related papers (2024-06-10T13:25:43Z)
Dynamic Shuffle: An Efficient Channel Mixture Method [8.720510396996142]
We devise a dynamic shuffle module to generate data-dependent permutation matrices for shuffling. Experiment results on image classification benchmark datasets have shown that our method significantly increases ShuffleNets' performance.
arXiv Detail & Related papers (2023-10-04T12:47:48Z)
An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z)
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL. We show that transformers can implement a broad class of standard machine learning algorithms in context. A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z)
Generalized Learning Vector Quantization for Classification in Randomized Neural Networks and Hyperdimensional Computing [4.4886210896619945]
We propose a modified RVFL network that avoids computationally expensive matrix operations during training. The proposed approach achieved state-of-the-art accuracy on a collection of datasets from the UCI Machine Learning Repository.
arXiv Detail & Related papers (2021-06-17T21:17:17Z)
Container: Context Aggregation Network [83.12004501984043]
Recent finding shows that a simple based solution without any traditional convolutional or Transformer components can produce effective visual representations. We present the model (CONText Ion NERtwok), a general-purpose building block for multi-head context aggregation. In contrast to Transformer-based methods that do not scale well to downstream tasks that rely on larger input image resolutions, our efficient network, named modellight, can be employed in object detection and instance segmentation networks.
arXiv Detail & Related papers (2021-06-02T18:09:11Z)
Metalearning: Sparse Variable-Structure Automata [0.0]
We propose a metalearning approach to increase the number of basis vectors used in dynamic sparse coding vectors on the fly. An actor-critic algorithm is deployed to automatically choose an appropriate dimension for feature regarding the required level of accuracy.
arXiv Detail & Related papers (2021-01-30T21:32:23Z)
Computational optimization of convolutional neural networks using separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing. Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.