Related papers: Hierarchical Attention Transformer Architecture For Syntactic Spell Correction

Hierarchical Attention Transformer Architecture For Syntactic Spell Correction

URL: http://arxiv.org/abs/2005.04876v1
Date: Mon, 11 May 2020 06:19:01 GMT
Title: Hierarchical Attention Transformer Architecture For Syntactic Spell Correction
Authors: Abhishek Niranjan, M Ali Basha Shaik, Kushal Verma
Abstract summary: We propose multi encoder-single decoder variation of conventional transformer. We report significant improvement of 0.11%, 0.32% and 0.69% in character (CER), word (WER) and sentence (SER) error rates. Our architecture is also trains 7.8 times faster, and is only about 1/3 in size from the next most accurate model.
Score: 1.0312968200748118
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The attention mechanisms are playing a boosting role in advancements in sequence-to-sequence problems. Transformer architecture achieved new state of the art results in machine translation, and it's variants are since being introduced in several other sequence-to-sequence problems. Problems which involve a shared vocabulary, can benefit from the similar semantic and syntactic structure in the source and target sentences. With the motivation of building a reliable and fast post-processing textual module to assist all the text-related use cases in mobile phones, we take on the popular spell correction problem. In this paper, we propose multi encoder-single decoder variation of conventional transformer. Outputs from the three encoders with character level 1-gram, 2-grams and 3-grams inputs are attended in hierarchical fashion in the decoder. The context vectors from the encoders clubbed with self-attention amplify the n-gram properties at the character level and helps in accurate decoding. We demonstrate our model on spell correction dataset from Samsung Research, and report significant improvement of 0.11\%, 0.32\% and 0.69\% in character (CER), word (WER) and sentence (SER) error rates from existing state-of-the-art machine-translation architectures. Our architecture is also trains ~7.8 times faster, and is only about 1/3 in size from the next most accurate model.

Related papers

Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z)
TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance [15.72669617789124]
Scene text recognition (STR) is an important bridge between images and text. Recent methods use a frozen initial embedding to guide the decoder to decode the features to text, leading to a loss of accuracy. We propose a novel architecture for text recognition, named TRansformer-based text recognizer with Initial embedding Guidance (TRIG)
arXiv Detail & Related papers (2021-11-16T09:10:39Z)
Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems [38.672160430296536]
Transformer models have achieved state-of-the-art results in a wide range of NLP tasks including summarization. Previous work has focused on one important bottleneck, the quadratic self-attention mechanism in the encoder. This work focuses on the transformer's encoder-decoder attention mechanism.
arXiv Detail & Related papers (2021-09-08T19:32:42Z)
Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model. We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder. We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z)
Contextual Transformer Networks for Visual Recognition [103.79062359677452]
We design a novel Transformer-style module, i.e., Contextual Transformer (CoT) block, for visual recognition. Such design fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix. Our CoT block is appealing in the view that it can readily replace each $3times3$ convolution in ResNet architectures.
arXiv Detail & Related papers (2021-07-26T16:00:21Z)
Reinforcement Learning for on-line Sequence Transformation [0.0]
We introduce an architecture that learns with reinforcement to make decisions about whether to read a token or write another token. In an experimental study we compare it with state-of-the-art methods for neural machine translation.
arXiv Detail & Related papers (2021-05-28T20:31:25Z)
Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs) We compare their accuracy and performance on widely used public datasets of scene and handwritten text. Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z)
Efficient Wait-k Models for Simultaneous Machine Translation [46.01342928010307]
Simultaneous machine translation consists in starting output generation before the entire input sequence is available. Wait-k decoders offer a simple but efficient approach for this problem. We investigate the behavior of wait-k decoding in low resource settings for spoken corpora using IWSLT datasets.
arXiv Detail & Related papers (2020-05-18T11:14:23Z)
On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder. The gates are regularized using the expected value of the sparsity-inducing L0penalty. We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)
Consistent Multiple Sequence Decoding [36.46573114422263]
We introduce a consistent multiple sequence decoding architecture. This architecture allows for consistent and simultaneous decoding of an arbitrary number of sequences. We show the efficacy of our consistent multiple sequence decoder on the task of dense relational image captioning.
arXiv Detail & Related papers (2020-04-02T00:43:54Z)
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation [73.11214377092121]
We propose to replace all but one attention head of each encoder layer with simple fixed -- non-learnable -- attentive patterns. Our experiments with different data sizes and multiple language pairs show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality.
arXiv Detail & Related papers (2020-02-24T13:53:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.