Related papers: ENCONTER: Entity Constrained Progressive Sequence Generation via Insertion-based Transformer

ENCONTER: Entity Constrained Progressive Sequence Generation via Insertion-based Transformer

URL: http://arxiv.org/abs/2103.09548v1
Date: Wed, 17 Mar 2021 10:24:10 GMT
Title: ENCONTER: Entity Constrained Progressive Sequence Generation via Insertion-based Transformer
Authors: Lee-Hsun Hsieh and Yang-Yin Lee and Ee-Peng Lim
Abstract summary: Autoregressive language models do not perform well under hard lexical constraints. Progressive insertion-based transformers can overcome this limitation. The paper proposes the Entity-constrained insertion transformer (ENCONTER) Our experiments show that ENCONTER outperforms other baseline models in several performance metrics.
Score: 11.310502327308575
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pretrained using large amount of data, autoregressive language models are able to generate high quality sequences. However, these models do not perform well under hard lexical constraints as they lack fine control of content generation process. Progressive insertion-based transformers can overcome the above limitation and efficiently generate a sequence in parallel given some input tokens as constraint. These transformers however may fail to support hard lexical constraints as their generation process is more likely to terminate prematurely. The paper analyses such early termination problems and proposes the Entity-constrained insertion transformer (ENCONTER), a new insertion transformer that addresses the above pitfall without compromising much generation efficiency. We introduce a new training strategy that considers predefined hard lexical constraints (e.g., entities to be included in the generated sequence). Our experiments show that ENCONTER outperforms other baseline models in several performance metrics rendering it more suitable in practical applications. Our code is available at https://github.com/LARC-CMU-SMU/Enconter

Related papers

Tractable Transformers for Flexible Conditional Generation [44.52426555357705]
Tractable Transformers (Tracformer) is a Transformer-based generative model that is more robust to different conditional generation tasks. This paper proposes Tractable Transformers, a Transformer-based generative model that is more robust to different conditional generation tasks.
arXiv Detail & Related papers (2025-02-11T15:05:26Z)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z)
Repeat After Me: Transformers are Better than State Space Models at Copying [53.47717661441142]
We show that while generalized state space models are promising in terms of inference-time efficiency, they are limited compared to transformer models on tasks that require copying from the input context.
arXiv Detail & Related papers (2024-02-01T21:44:11Z)
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator [24.690247474891958]
Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large pretrained models. Our model achieves state-of-the-art performances among all transformer-based models on the long-range modeling benchmark LRA. For generative seq-to-seq tasks including CNN/DailyMail and ELI5, by inheriting the BART weights our model outperforms the standard BART.
arXiv Detail & Related papers (2023-05-24T12:33:06Z)
Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order. In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z)
Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep. We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z)
Towards More Efficient Insertion Transformer with Fractional Positional Encoding [44.45401243989363]
Auto-regressive neural sequence models have been shown to be effective across text generation tasks. Their left-to-right decoding order prevents generation from being parallelized. Insertion Transformer is an attractive alternative that allows outputting multiple tokens in a single generation step.
arXiv Detail & Related papers (2021-12-12T18:38:27Z)
Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU [19.103130032967663]
Incremental processing allows interactive systems to respond based on partial inputs. Recent work attempts to apply Transformers incrementally via restart-incrementality. This approach is computationally costly and does not scale efficiently for long sequences.
arXiv Detail & Related papers (2021-09-15T15:20:29Z)
Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks. We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.