Related papers: Towards More Efficient Insertion Transformer with Fractional Positional Encoding

Towards More Efficient Insertion Transformer with Fractional Positional Encoding

URL: http://arxiv.org/abs/2112.06295v1
Date: Sun, 12 Dec 2021 18:38:27 GMT
Title: Towards More Efficient Insertion Transformer with Fractional Positional Encoding
Authors: Zhisong Zhang, Yizhe Zhang, Bill Dolan
Abstract summary: Auto-regressive neural sequence models have been shown to be effective across text generation tasks. Their left-to-right decoding order prevents generation from being parallelized. Insertion Transformer is an attractive alternative that allows outputting multiple tokens in a single generation step.
Score: 44.45401243989363
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Auto-regressive neural sequence models have been shown to be effective across text generation tasks. However, their left-to-right decoding order prevents generation from being parallelized. Insertion Transformer (Stern et al., 2019) is an attractive alternative that allows outputting multiple tokens in a single generation step. Nevertheless, due to the incompatibility of absolute positional encoding and insertion-based generation schemes, it needs to refresh the encoding of every token in the generated partial hypotheses at each step, which could be costly. We design a novel incremental positional encoding scheme for insertion transformers called Fractional Positional Encoding (FPE), which allows reusing representations calculated in previous steps. Empirical studies on various language generation tasks demonstrate the effectiveness of FPE, which leads to reduction of floating point operations and latency improvements on batched decoding.

Related papers

Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z)
Self-Infilling Code Generation [60.12883980846781]
We introduce self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding. We utilize this capability to introduce novel interruption and looping mechanisms in conventional decoding. Our proposed decoding process is effective in enhancing both regularity and quality across several code generation benchmarks.
arXiv Detail & Related papers (2023-11-29T16:02:06Z)
LAIT: Efficient Multi-Segment Encoding in Transformers with Layer-Adjustable Interaction [31.895986544484206]
We introduce Layer- Interactions in Transformers (LAIT) Within LAIT, segmented inputs are first encoded independently, and then jointly. We find LAIT able to reduce 30-50% of the attention FLOPs on many tasks, while preserving high accuracy.
arXiv Detail & Related papers (2023-05-31T06:09:59Z)
Transformer with Tree-order Encoding for Neural Program Generation [8.173517923612426]
We introduce a tree-based positional encoding and a shared natural-language subword vocabulary for Transformers. Our findings suggest that employing a tree-based positional encoding in combination with a shared natural-language subword vocabulary improves generation performance over sequential positional encodings.
arXiv Detail & Related papers (2022-05-30T12:27:48Z)
Demystifying the Better Performance of Position Encoding Variants for Transformer [12.503079503907989]
We show how to encode position and segment into Transformer models. The proposed method performs on par with SOTA on GLUE, XTREME and WMT benchmarks while saving costs.
arXiv Detail & Related papers (2021-04-18T03:44:57Z)
ENCONTER: Entity Constrained Progressive Sequence Generation via Insertion-based Transformer [11.310502327308575]
Autoregressive language models do not perform well under hard lexical constraints. Progressive insertion-based transformers can overcome this limitation. The paper proposes the Entity-constrained insertion transformer (ENCONTER) Our experiments show that ENCONTER outperforms other baseline models in several performance metrics.
arXiv Detail & Related papers (2021-03-17T10:24:10Z)
On Efficient Training, Controllability and Compositional Generalization of Insertion-based Language Generators [18.98725770517241]
InsNet is an insertion-based sequence model that can be trained as efficiently as transformer decoders. We evaluate InsNet on story generation and CleVR-CoGENT captioning.
arXiv Detail & Related papers (2021-02-12T11:05:02Z)
Cascaded Text Generation with Markov Transformers [122.76100449018061]
Two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies. This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output. This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.
arXiv Detail & Related papers (2020-06-01T17:52:15Z)
On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder. The gates are regularized using the expected value of the sparsity-inducing L0penalty. We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)
Addressing Some Limitations of Transformers with Feedback Memory [51.94640029417114]
Transformers have been successfully applied to sequential, auto-regressive tasks despite being feedforward networks. We propose the Feedback Transformer architecture that exposes all previous representations to all future representations. We demonstrate on a variety of benchmarks in language modeling, machine translation, and reinforcement learning that the increased representation capacity can create small, shallow models with much stronger performance than comparable Transformers.
arXiv Detail & Related papers (2020-02-21T16:37:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.