Compressing 1D Time-Channel Separable Convolutions using Sparse Random
Ternary Matrices
- URL: http://arxiv.org/abs/2103.17142v3
- Date: Fri, 2 Apr 2021 04:48:06 GMT
- Title: Compressing 1D Time-Channel Separable Convolutions using Sparse Random
Ternary Matrices
- Authors: Gon\c{c}alo Mordido, Matthijs Van Keirsbilck, and Alexander Keller
- Abstract summary: We replace 1x1-convolutions in 1D time-channel separable convolutions with constant, sparse random ternary matrices with weights in $-1,0,+1$.
For command recognition on Google Speech Commands v1, we improve the state-of-the-art accuracy from $97.21%$ to $97.41%$ at the same network size.
For speech recognition on Librispeech, we half the number of weights to be trained while only sacrificing about $1%$ of the floating-point baseline's word error rate.
- Score: 65.4388266814055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We demonstrate that 1x1-convolutions in 1D time-channel separable
convolutions may be replaced by constant, sparse random ternary matrices with
weights in $\{-1,0,+1\}$. Such layers do not perform any multiplications and do
not require training. Moreover, the matrices may be generated on the chip
during computation and therefore do not require any memory access. With the
same parameter budget, we can afford deeper and more expressive models,
improving the Pareto frontiers of existing models on several tasks. For command
recognition on Google Speech Commands v1, we improve the state-of-the-art
accuracy from $97.21\%$ to $97.41\%$ at the same network size. Alternatively,
we can lower the cost of existing models. For speech recognition on
Librispeech, we half the number of weights to be trained while only sacrificing
about $1\%$ of the floating-point baseline's word error rate.
Related papers
- ReALLM: A general framework for LLM compression and fine-tuning [11.738510106847414]
ReALLM is a novel approach for compression and memory-efficient adaptation of pre-trained language models.
Weight-only quantization algorithm yields the best results on language generation tasks (C4 and WikiText-2) for a budget of $3$ bits without any training.
arXiv Detail & Related papers (2024-05-21T18:50:51Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - RSC: Accelerating Graph Neural Networks Training via Randomized Sparse
Computations [56.59168541623729]
Training graph neural networks (GNNs) is time consuming because sparse graph-based operations are hard to be accelerated by hardware.
We explore trading off the computational precision to reduce the time complexity via sampling-based approximation.
We propose Randomized Sparse Computation, which for the first time demonstrate the potential of training GNNs with approximated operations.
arXiv Detail & Related papers (2022-10-19T17:25:33Z) - Training Overparametrized Neural Networks in Sublinear Time [14.918404733024332]
Deep learning comes at a tremendous computational and energy cost.
We present a new and a subset of binary neural networks, as a small subset of search trees, where each corresponds to a subset of search trees (Ds)
We believe this view would have further applications in analysis analysis of deep networks (Ds)
arXiv Detail & Related papers (2022-08-09T02:29:42Z) - Monarch: Expressive Structured Matrices for Efficient and Accurate
Training [64.6871423399431]
Large neural networks excel in many domains, but they are expensive to train and fine-tune.
A popular approach to reduce their compute or memory requirements is to replace dense weight matrices with structured ones.
We propose a class of matrices (Monarch) that is hardware-efficient.
arXiv Detail & Related papers (2022-04-01T17:37:29Z) - Training Multi-Layer Over-Parametrized Neural Network in Subquadratic
Time [12.348083977777833]
We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function.
In this work, we show how to reduce the training cost per iteration.
arXiv Detail & Related papers (2021-12-14T18:13:36Z) - Sub-Linear Memory: How to Make Performers SLiM [38.068090269482425]
vanilla Transformers require $O(L2)$ in serial time and memory as functions of input length $L$.
Recent works proposed various linear self-attention mechanisms, scaling only as $O(L)$ for serial computation.
We observe a remarkable computational flexibility: forward and backward propagation can be performed with no approximations using sublinear memory.
arXiv Detail & Related papers (2020-12-21T13:56:04Z) - Improving Robustness and Generality of NLP Models Using Disentangled
Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$.
We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning.
We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z) - Provably Efficient Reinforcement Learning for Discounted MDPs with
Feature Mapping [99.59319332864129]
In this paper, we study reinforcement learning for discounted Decision (MDP)
We propose a novel algorithm that makes use of the feature mapping and obtains a $tilde O(dsqrtT/ (1-gamma)2)$ regret.
Our upper and lower bound results together suggest that the proposed reinforcement learning algorithm is near-optimal up to a $ (1-gamma)-0.5$ factor.
arXiv Detail & Related papers (2020-06-23T17:08:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.