ENCONTER: Entity Constrained Progressive Sequence Generation via
Insertion-based Transformer
- URL: http://arxiv.org/abs/2103.09548v1
- Date: Wed, 17 Mar 2021 10:24:10 GMT
- Title: ENCONTER: Entity Constrained Progressive Sequence Generation via
Insertion-based Transformer
- Authors: Lee-Hsun Hsieh and Yang-Yin Lee and Ee-Peng Lim
- Abstract summary: Autoregressive language models do not perform well under hard lexical constraints.
Progressive insertion-based transformers can overcome this limitation.
The paper proposes the Entity-constrained insertion transformer (ENCONTER)
Our experiments show that ENCONTER outperforms other baseline models in several performance metrics.
- Score: 11.310502327308575
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained using large amount of data, autoregressive language models are
able to generate high quality sequences. However, these models do not perform
well under hard lexical constraints as they lack fine control of content
generation process. Progressive insertion-based transformers can overcome the
above limitation and efficiently generate a sequence in parallel given some
input tokens as constraint. These transformers however may fail to support hard
lexical constraints as their generation process is more likely to terminate
prematurely. The paper analyses such early termination problems and proposes
the Entity-constrained insertion transformer (ENCONTER), a new insertion
transformer that addresses the above pitfall without compromising much
generation efficiency. We introduce a new training strategy that considers
predefined hard lexical constraints (e.g., entities to be included in the
generated sequence). Our experiments show that ENCONTER outperforms other
baseline models in several performance metrics rendering it more suitable in
practical applications. Our code is available at
https://github.com/LARC-CMU-SMU/Enconter
Related papers
- Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Repeat After Me: Transformers are Better than State Space Models at Copying [53.47717661441142]
We show that while generalized state space models are promising in terms of inference-time efficiency, they are limited compared to transformer models on tasks that require copying from the input context.
arXiv Detail & Related papers (2024-02-01T21:44:11Z) - Fourier Transformer: Fast Long Range Modeling by Removing Sequence
Redundancy with FFT Operator [24.690247474891958]
Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large pretrained models.
Our model achieves state-of-the-art performances among all transformer-based models on the long-range modeling benchmark LRA.
For generative seq-to-seq tasks including CNN/DailyMail and ELI5, by inheriting the BART weights our model outperforms the standard BART.
arXiv Detail & Related papers (2023-05-24T12:33:06Z) - Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order.
In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z) - Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep.
We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z) - Towards More Efficient Insertion Transformer with Fractional Positional
Encoding [44.45401243989363]
Auto-regressive neural sequence models have been shown to be effective across text generation tasks.
Their left-to-right decoding order prevents generation from being parallelized.
Insertion Transformer is an attractive alternative that allows outputting multiple tokens in a single generation step.
arXiv Detail & Related papers (2021-12-12T18:38:27Z) - Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU [19.103130032967663]
Incremental processing allows interactive systems to respond based on partial inputs.
Recent work attempts to apply Transformers incrementally via restart-incrementality.
This approach is computationally costly and does not scale efficiently for long sequences.
arXiv Detail & Related papers (2021-09-15T15:20:29Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.