Related papers: Reinforcement Learning for on-line Sequence Transformation

Reinforcement Learning for on-line Sequence Transformation

URL: http://arxiv.org/abs/2105.14097v1
Date: Fri, 28 May 2021 20:31:25 GMT
Title: Reinforcement Learning for on-line Sequence Transformation
Authors: Grzegorz Rype\'s\'c, {\L}ukasz Lepak, Pawe{\l} Wawrzy\'nski
Abstract summary: We introduce an architecture that learns with reinforcement to make decisions about whether to read a token or write another token. In an experimental study we compare it with state-of-the-art methods for neural machine translation.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A number of problems in the processing of sound and natural language, as well as in other areas, can be reduced to simultaneously reading an input sequence and writing an output sequence of generally different length. There are well developed methods that produce the output sequence based on the entirely known input. However, efficient methods that enable such transformations on-line do not exist. In this paper we introduce an architecture that learns with reinforcement to make decisions about whether to read a token or write another token. This architecture is able to transform potentially infinite sequences on-line. In an experimental study we compare it with state-of-the-art methods for neural machine translation. While it produces slightly worse translations than Transformer, it outperforms the autoencoder with attention, even though our architecture translates texts on-line thereby solving a more difficult problem than both reference methods.

Related papers

Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z)
Fast Training of NMT Model with Data Sorting [0.0]
The Transformer model has revolutionized Natural Language Processing tasks such as Neural Machine Translation. One potential area for improvement is to address the study of empty tokens that the Transformer computes only to discard them later. We propose an algorithm that sorts sentence pairs based on their length before translation, minimizing the waste of computing power.
arXiv Detail & Related papers (2023-08-16T05:48:50Z)
Linearizing Transformer with Key-Value Memory Bank [54.83663647680612]
We propose MemSizer, an approach to project the source sequence into lower dimension representation. MemSizer not only achieves the same linear time complexity but also enjoys efficient recurrent-style autoregressive generation. We demonstrate that MemSizer provides an improved tradeoff between efficiency and accuracy over the vanilla transformer.
arXiv Detail & Related papers (2022-03-23T18:10:18Z)
Discovering Non-monotonic Autoregressive Orderings with Variational Inference [67.27561153666211]
We develop an unsupervised parallelizable learner that discovers high-quality generation orders purely from training data. We implement the encoder as a Transformer with non-causal attention that outputs permutations in one forward pass. Empirical results in language modeling tasks demonstrate that our method is context-aware and discovers orderings that are competitive with or even better than fixed orders.
arXiv Detail & Related papers (2021-10-27T16:08:09Z)
Duplex Sequence-to-Sequence Learning for Reversible Machine Translation [53.924941333388155]
Sequence-to-sequence (seq2seq) problems such as machine translation are bidirectional. We propose a em duplex seq2seq neural network, REDER, and apply it to machine translation. Experiments on widely-used machine translation benchmarks verify that REDER achieves the first success of reversible machine translation.
arXiv Detail & Related papers (2021-05-07T18:21:57Z)
Multi-branch Attentive Transformer [152.07840447196384]
We propose a simple yet effective variant of Transformer called multi-branch attentive Transformer. The attention layer is the average of multiple branches and each branch is an independent multi-head attention layer. Experiments on machine translation, code generation and natural language understanding demonstrate that such a simple variant of Transformer brings significant improvements.
arXiv Detail & Related papers (2020-06-18T04:24:28Z)
Character-level Transformer-based Neural Machine Translation [5.699756532377753]
We discuss a novel, Transformer-based approach, that we compare, both in speed and in quality to the Transformer at subword and character levels. We evaluate our models on 4 language pairs from WMT'15: DE-EN, CS-EN, FI-EN and RU-EN. The proposed novel architecture can be trained on a single GPU and is 34% percent faster than the character-level Transformer.
arXiv Detail & Related papers (2020-05-22T15:40:43Z)
Hierarchical Attention Transformer Architecture For Syntactic Spell Correction [1.0312968200748118]
We propose multi encoder-single decoder variation of conventional transformer. We report significant improvement of 0.11%, 0.32% and 0.69% in character (CER), word (WER) and sentence (SER) error rates. Our architecture is also trains 7.8 times faster, and is only about 1/3 in size from the next most accurate model.
arXiv Detail & Related papers (2020-05-11T06:19:01Z)
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
Non-Autoregressive Machine Translation with Disentangled Context Transformer [70.95181466892795]
State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts. Our model achieves competitive, if not better, performance compared to the state of the art in non-autoregressive machine translation while significantly reducing decoding time on average.
arXiv Detail & Related papers (2020-01-15T05:32:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.