A text autoencoder from transformer for fast encoding language
representation
- URL: http://arxiv.org/abs/2111.02844v1
- Date: Thu, 4 Nov 2021 13:09:10 GMT
- Title: A text autoencoder from transformer for fast encoding language
representation
- Authors: Tan Huang
- Abstract summary: We propose a deep bidirectional language model by using window masking mechanism at attention layer.
This work computes contextual language representations without random masking as does in BERT.
Our method shows O(n) complexity less compared to other transformer-based models with O($n2$)
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years BERT shows apparent advantages and great potential in natural
language processing tasks. However, both training and applying BERT requires
intensive time and resources for computing contextual language representations,
which hinders its universality and applicability. To overcome this bottleneck,
we propose a deep bidirectional language model by using window masking
mechanism at attention layer. This work computes contextual language
representations without random masking as does in BERT and maintains the deep
bidirectional architecture like BERT. To compute the same sentence
representation, our method shows O(n) complexity less compared to other
transformer-based models with O($n^2$). To further demonstrate its superiority,
computing context language representations on CPU environments is conducted, by
using the embeddings from the proposed method, logistic regression shows much
higher accuracy in terms of SMS classification. Moverover, the proposed method
also achieves significant higher performance in semantic similarity tasks.
Related papers
- Mind the Gap: A Generalized Approach for Cross-Modal Embedding Alignment [0.0]
Retrieval-Augmented Generation (RAG) systems retrieve context across different text modalities due to semantic gaps.
We introduce a generalized projection-based method, inspired by adapter modules in transfer learning, that efficiently bridges these gaps.
Our approach emphasizes speed, accuracy, and data efficiency, requiring minimal resources for training and inference.
arXiv Detail & Related papers (2024-10-30T20:28:10Z) - Online Gesture Recognition using Transformer and Natural Language
Processing [0.0]
Transformer architecture is shown to provide a powerful machine framework for online gestures corresponding to glyph strokes of natural language sentences.
Transformer architecture is shown to provide a powerful machine framework for online gestures corresponding to glyph strokes of natural language sentences.
arXiv Detail & Related papers (2023-05-05T10:17:22Z) - XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems
to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders.
Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - GroupBERT: Enhanced Transformer Architecture with Efficient Grouped
Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture.
We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions.
We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z) - Demystifying BERT: Implications for Accelerator Design [4.80595971865854]
We focus on BERT, one of the most popular NLP transfer learning algorithms, to identify how its algorithmic behavior can guide future accelerator design.
We characterize compute-intensive BERT computations and discuss software and possible hardware mechanisms to further optimize these computations.
Overall, our analysis identifies holistic solutions to optimize systems for BERT-like models.
arXiv Detail & Related papers (2021-04-14T01:06:49Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z) - Fast and Accurate Deep Bidirectional Language Representations for
Unsupervised Learning [31.897630023454067]
We propose a novel deep bidirectional language model called Transformer-based Text Autoencoder (T-TA)
The T-TA computes contextual language representations without repetition and has benefits of the deep bidirectional architecture like BERT.
In run-time experiments on CPU environments, the proposed T-TA performs over six times faster than the BERT-based model in the reranking task and twelve times faster in the semantic similarity task.
arXiv Detail & Related papers (2020-04-17T07:43:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.