Related papers: A text autoencoder from transformer for fast encoding language representation

A text autoencoder from transformer for fast encoding language representation

URL: http://arxiv.org/abs/2111.02844v1
Date: Thu, 4 Nov 2021 13:09:10 GMT
Title: A text autoencoder from transformer for fast encoding language representation
Authors: Tan Huang
Abstract summary: We propose a deep bidirectional language model by using window masking mechanism at attention layer. This work computes contextual language representations without random masking as does in BERT. Our method shows O(n) complexity less compared to other transformer-based models with O($n2$)
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years BERT shows apparent advantages and great potential in natural language processing tasks. However, both training and applying BERT requires intensive time and resources for computing contextual language representations, which hinders its universality and applicability. To overcome this bottleneck, we propose a deep bidirectional language model by using window masking mechanism at attention layer. This work computes contextual language representations without random masking as does in BERT and maintains the deep bidirectional architecture like BERT. To compute the same sentence representation, our method shows O(n) complexity less compared to other transformer-based models with O($n^2$). To further demonstrate its superiority, computing context language representations on CPU environments is conducted, by using the embeddings from the proposed method, logistic regression shows much higher accuracy in terms of SMS classification. Moverover, the proposed method also achieves significant higher performance in semantic similarity tasks.

Related papers

Advancements in Natural Language Processing: Exploring Transformer-Based Architectures for Text Understanding [10.484788943232674]
This paper explores the advancements in transformer models, such as BERT and GPT, focusing on their superior performance in text understanding tasks. The results demonstrate state-of-the-art performance on benchmarks like GLUE and SQuAD, with F1 scores exceeding 90%, though challenges such as high computational costs persist.
arXiv Detail & Related papers (2025-03-26T04:45:33Z)
Mind the Gap: A Generalized Approach for Cross-Modal Embedding Alignment [0.0]
Retrieval-Augmented Generation (RAG) systems retrieve context across different text modalities due to semantic gaps. We introduce a generalized projection-based method, inspired by adapter modules in transfer learning, that efficiently bridges these gaps. Our approach emphasizes speed, accuracy, and data efficiency, requiring minimal resources for training and inference.
arXiv Detail & Related papers (2024-10-30T20:28:10Z)
Online Gesture Recognition using Transformer and Natural Language Processing [0.0]
Transformer architecture is shown to provide a powerful machine framework for online gestures corresponding to glyph strokes of natural language sentences. Transformer architecture is shown to provide a powerful machine framework for online gestures corresponding to glyph strokes of natural language sentences.
arXiv Detail & Related papers (2023-05-05T10:17:22Z)
XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders. Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z)
Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT) CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z)
GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture. We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions. We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z)
Demystifying BERT: Implications for Accelerator Design [4.80595971865854]
We focus on BERT, one of the most popular NLP transfer learning algorithms, to identify how its algorithmic behavior can guide future accelerator design. We characterize compute-intensive BERT computations and discuss software and possible hardware mechanisms to further optimize these computations. Overall, our analysis identifies holistic solutions to optimize systems for BERT-like models.
arXiv Detail & Related papers (2021-04-14T01:06:49Z)
Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity. Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset. We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z)
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. VAEs tend to ignore latent variables with a strong auto-regressive decoder. We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning [31.897630023454067]
We propose a novel deep bidirectional language model called Transformer-based Text Autoencoder (T-TA) The T-TA computes contextual language representations without repetition and has benefits of the deep bidirectional architecture like BERT. In run-time experiments on CPU environments, the proposed T-TA performs over six times faster than the BERT-based model in the reranking task and twelve times faster in the semantic similarity task.
arXiv Detail & Related papers (2020-04-17T07:43:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.