Partial Tensorized Transformers for Natural Language Processing
- URL: http://arxiv.org/abs/2310.20077v1
- Date: Mon, 30 Oct 2023 23:19:06 GMT
- Title: Partial Tensorized Transformers for Natural Language Processing
- Authors: Subhadra Vadlamannati, Ryan Solgi
- Abstract summary: We study the effect of tensor-train decomposition to improve the accuracy and compress vision-language neural networks, namely BERT and ViT.
Our novel PTNN approach significantly improves the accuracy of existing models by up to 5%, all without the need for post-training adjustments.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The transformer architecture has revolutionized Natural Language Processing
(NLP) and other machine-learning tasks, due to its unprecedented accuracy.
However, their extensive memory and parameter requirements often hinder their
practical applications. In this work, we study the effect of tensor-train
decomposition to improve the accuracy and compress transformer vision-language
neural networks, namely BERT and ViT. We focus both on embedding-layer
compression and partial tensorization of neural networks (PTNN) through an
algorithmic approach. Our novel PTNN approach significantly improves the
accuracy of existing models by up to 5%, all without the need for post-training
adjustments, breaking new ground in the field of tensor decomposition.
Related papers
- Deep-Unrolling Multidimensional Harmonic Retrieval Algorithms on Neuromorphic Hardware [78.17783007774295]
This paper explores the potential of conversion-based neuromorphic algorithms for highly accurate and energy-efficient single-snapshot multidimensional harmonic retrieval.
A novel method for converting the complex-valued convolutional layers and activations into spiking neural networks (SNNs) is developed.
The converted SNNs achieve almost five-fold power efficiency at moderate performance loss compared to the original CNNs.
arXiv Detail & Related papers (2024-12-05T09:41:33Z) - Neural Functional Transformers [99.98750156515437]
This paper uses the attention mechanism to define a novel set of permutation equivariant weight-space layers called neural functional Transformers (NFTs)
NFTs respect weight-space permutation symmetries while incorporating the advantages of attention, which have exhibited remarkable success across multiple domains.
We also leverage NFTs to develop Inr2Array, a novel method for computing permutation invariant representations from the weights of implicit neural representations (INRs)
arXiv Detail & Related papers (2023-05-22T23:38:27Z) - RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.
We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z) - Tensor Decomposition for Model Reduction in Neural Networks: A Review [13.96938227911258]
Modern neural networks have revolutionized the fields of computer vision (CV) and Natural Language Processing (NLP)
They are widely used for solving complex CV tasks and NLP tasks such as image classification, image generation, and machine translation.
This paper reviews six tensor decomposition methods and illustrates their ability to compress model parameters.
arXiv Detail & Related papers (2023-04-26T13:12:00Z) - Variational Tensor Neural Networks for Deep Learning [0.0]
We propose an integration of tensor networks (TN) into deep neural networks (NNs)
This in turn, results in a scalable tensor neural network (TNN) architecture capable of efficient training over a large parameter space.
We validate the accuracy and efficiency of our method by designing TNN models and providing benchmark results for linear and non-linear regressions, data classification and image recognition on MNIST handwritten digits.
arXiv Detail & Related papers (2022-11-26T20:24:36Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - BiT: Robustly Binarized Multi-distilled Transformer [36.06192421902272]
We develop binarized transformer models that are at a practical level of accuracy, approaching a full-precision BERT baseline within as little as 5.9%.
These approaches allow for the first time, fully binarized transformer models that are at a practical level of accuracy, approaching a full-precision BERT baseline within as little as 5.9%.
arXiv Detail & Related papers (2022-05-25T19:01:54Z) - Mixed Precision of Quantization of Transformer Language Models for
Speech Recognition [67.95996816744251]
State-of-the-art neural language models represented by Transformers are becoming increasingly complex and expensive for practical applications.
Current low-bit quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different parts of the system to quantization errors.
The optimal local precision settings are automatically learned using two techniques.
Experiments conducted on Penn Treebank (PTB) and a Switchboard corpus trained LF-MMI TDNN system.
arXiv Detail & Related papers (2021-11-29T09:57:00Z) - Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation.
A linear-complexity recurrent variant has proven well suited for autoregressive generation.
This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.