Context- and Sequence-Aware Convolutional Recurrent Encoder for Neural
Machine Translation
- URL: http://arxiv.org/abs/2101.04030v2
- Date: Sun, 21 Mar 2021 07:55:51 GMT
- Title: Context- and Sequence-Aware Convolutional Recurrent Encoder for Neural
Machine Translation
- Authors: Ritam Mallick, Seba Susan, Vaibhaw Agrawal, Rizul Garg, Prateek Rawal
- Abstract summary: Existing models use recurrent neural networks to construct the encoder and decoder modules.
In alternative research, the recurrent networks were substituted by convolutional neural networks for capturing the syntactic structure in the input sentence.
We incorporate the goodness of both approaches by proposing a convolutional-recurrent encoder for capturing the context information.
- Score: 2.729898906885749
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Neural Machine Translation model is a sequence-to-sequence converter based on
neural networks. Existing models use recurrent neural networks to construct
both the encoder and decoder modules. In alternative research, the recurrent
networks were substituted by convolutional neural networks for capturing the
syntactic structure in the input sentence and decreasing the processing time.
We incorporate the goodness of both approaches by proposing a
convolutional-recurrent encoder for capturing the context information as well
as the sequential information from the source sentence. Word embedding and
position embedding of the source sentence is performed prior to the
convolutional encoding layer which is basically a n-gram feature extractor
capturing phrase-level context information. The rectified output of the
convolutional encoding layer is added to the original embedding vector, and the
sum is normalized by layer normalization. The normalized output is given as a
sequential input to the recurrent encoding layer that captures the temporal
information in the sequence. For the decoder, we use the attention-based
recurrent neural network. Translation task on the German-English dataset
verifies the efficacy of the proposed approach from the higher BLEU scores
achieved as compared to the state of the art.
Related papers
- Improving Transformers using Faithful Positional Encoding [55.30212768657544]
We propose a new positional encoding method for a neural network architecture called the Transformer.
Unlike the standard sinusoidal positional encoding, our approach has a guarantee of not losing information about the positional order of the input sequence.
arXiv Detail & Related papers (2024-05-15T03:17:30Z) - Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary [1.4594704809280983]
Positional encoding is a high-dimensional representation of time indices on input data.
RNNs can encode the temporal information of data points on their own, rendering their use of positional encoding seemingly redundant/unnecessary.
arXiv Detail & Related papers (2024-01-31T23:32:20Z) - Learned layered coding for Successive Refinement in the Wyner-Ziv
Problem [18.134147308944446]
We propose a data-driven approach to explicitly learn the progressive encoding of a continuous source.
This setup refers to the successive refinement of the Wyner-Ziv coding problem.
We demonstrate that RNNs can explicitly retrieve layered binning solutions akin to scalable nested quantization.
arXiv Detail & Related papers (2023-11-06T12:45:32Z) - Locality-Aware Generalizable Implicit Neural Representation [54.93702310461174]
Generalizable implicit neural representation (INR) enables a single continuous function to represent multiple data instances.
We propose a novel framework for generalizable INR that combines a transformer encoder with a locality-aware INR decoder.
Our framework significantly outperforms previous generalizable INRs and validates the usefulness of the locality-aware latents for downstream tasks.
arXiv Detail & Related papers (2023-10-09T11:26:58Z) - Return of the RNN: Residual Recurrent Networks for Invertible Sentence
Embeddings [0.0]
This study presents a novel model for invertible sentence embeddings using a residual recurrent network trained on an unsupervised encoding task.
Rather than the probabilistic outputs common to neural machine translation models, our approach employs a regression-based output layer to reconstruct the input sequence's word vectors.
The model achieves high accuracy and fast training with the ADAM, a significant finding given that RNNs typically require memory units, such as LSTMs, or second-order optimization methods.
arXiv Detail & Related papers (2023-03-23T15:59:06Z) - Surrogate Gradient Spiking Neural Networks as Encoders for Large
Vocabulary Continuous Speech Recognition [91.39701446828144]
We show that spiking neural networks can be trained like standard recurrent neural networks using the surrogate gradient method.
They have shown promising results on speech command recognition tasks.
In contrast to their recurrent non-spiking counterparts, they show robustness to exploding gradient problems without the need to use gates.
arXiv Detail & Related papers (2022-12-01T12:36:26Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - Transition based Graph Decoder for Neural Machine Translation [41.7284715234202]
We propose a general Transformer-based approach for tree and graph decoding based on generating a sequence of transitions.
We show improved performance over the standard Transformer decoder, as well as over ablated versions of the model.
arXiv Detail & Related papers (2021-01-29T15:20:45Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z) - Transformer Transducer: A Streamable Speech Recognition Model with
Transformer Encoders and RNN-T Loss [14.755108017449295]
We present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system.
Transformer computation blocks based on self-attention are used to encode both audio and label sequences independently.
We present results on the LibriSpeech dataset showing that limiting the left context for self-attention makes decoding computationally tractable for streaming.
arXiv Detail & Related papers (2020-02-07T00:04:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.