Related papers: Non-Autoregressive Text Generation with Pre-trained Language Models

Non-Autoregressive Text Generation with Pre-trained Language Models

URL: http://arxiv.org/abs/2102.08220v1
Date: Tue, 16 Feb 2021 15:30:33 GMT
Title: Non-Autoregressive Text Generation with Pre-trained Language Models
Authors: Yixuan Su, Deng Cai, Yan Wang, David Vandyke, Simon Baker, Piji Li, Nigel Collier
Abstract summary: We show that BERT can be employed as the backbone of a NAG model to greatly improve performance. We devise mechanisms to alleviate the two common problems of vanilla NAG models. We propose a new decoding strategy, ratio-first, for applications where the output lengths can be approximately estimated beforehand.
Score: 40.50508206201288
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Non-autoregressive generation (NAG) has recently attracted great attention due to its fast inference speed. However, the generation quality of existing NAG models still lags behind their autoregressive counterparts. In this work, we show that BERT can be employed as the backbone of a NAG model to greatly improve performance. Additionally, we devise mechanisms to alleviate the two common problems of vanilla NAG models: the inflexibility of prefixed output length and the conditional independence of individual token predictions. Lastly, to further increase the speed advantage of the proposed model, we propose a new decoding strategy, ratio-first, for applications where the output lengths can be approximately estimated beforehand. For a comprehensive evaluation, we test the proposed model on three text generation tasks, including text summarization, sentence compression and machine translation. Experimental results show that our model significantly outperforms existing non-autoregressive baselines and achieves competitive performance with many strong autoregressive models. In addition, we also conduct extensive analysis experiments to reveal the effect of each proposed component.

Related papers

Autoregressive Video Generation without Vector Quantization [90.87907377618747]
We reformulate the video generation problem as a non-quantized autoregressive modeling of temporal frame-by-frame prediction. With the proposed approach, we train a novel video autoregressive model without vector quantization, termed NOVA. Our results demonstrate that NOVA surpasses prior autoregressive video models in data efficiency, inference speed, visual fidelity, and video fluency, even with a much smaller model capacity.
arXiv Detail & Related papers (2024-12-18T18:59:53Z)
GASE: Generatively Augmented Sentence Encoding [0.0]
We propose an approach to enhance sentence embeddings by applying generative text models for data augmentation at inference time. Generatively Augmented Sentence uses diverse synthetic variants of input texts generated by paraphrasing, summarising or extracting keywords. We find that generative augmentation leads to larger performance improvements for embedding models with lower baseline performance.
arXiv Detail & Related papers (2024-11-07T17:53:47Z)
DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling [51.055580277828]
We propose DynaMo, a suite of multi-token prediction language models that reduce net inference times. Our models $textitdynamically$ predict multiple tokens based on their confidence in the predicted joint probability distribution. We also propose novel ways to enhance the estimated joint probability to improve text generation quality.
arXiv Detail & Related papers (2024-05-01T22:17:57Z)
Utilizing Multiple Inputs Autoregressive Models for Bearing Remaining Useful Life Prediction [3.448070371030467]
We introduce a novel multi-input autoregressive model to address this challenge in RUL prediction for bearings. Through autoregressive iterations, the model attains a global receptive field, effectively overcoming the limitations in generalization. Empirical evaluation on the PMH2012 dataset demonstrates that our model, compared to other backbone networks using similar autoregressive approaches, achieves significantly lower Root Mean Square Error (RMSE) and Score.
arXiv Detail & Related papers (2023-11-26T09:50:32Z)
Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation [98.37871690400766]
Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine translation. Existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models. We propose Pre-trained Directed Acyclic Transformer to promote prediction consistency in NAR generation.
arXiv Detail & Related papers (2023-04-24T02:30:33Z)
Leveraging Pre-trained Models for Failure Analysis Triplets Generation [0.0]
We leverage the attention mechanism of pre-trained causal language models such as Transformer model for the downstream task of generating Failure Analysis Triplets (FATs) We observe that Generative Pre-trained Transformer 2 (GPT2) outperformed other transformer model for the failure analysis triplet generation (FATG) task. In particular, we observe that GPT2 (trained on 1.5B parameters) outperforms pre-trained BERT, BART and GPT3 by a large margin on ROUGE.
arXiv Detail & Related papers (2022-10-31T17:21:15Z)
Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep. We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z)
A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities. We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention. Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z)
Improving Non-autoregressive Generation with Mixup Training [51.61038444990301]
We present a non-autoregressive generation model based on pre-trained transformer models. We propose a simple and effective iterative training method called MIx Source and pseudo Target. Our experiments on three generation benchmarks including question generation, summarization and paraphrase generation, show that the proposed framework achieves the new state-of-the-art results.
arXiv Detail & Related papers (2021-10-21T13:04:21Z)
End-to-end Neural Coreference Resolution Revisited: A Simple yet Effective Baseline [20.431647446999996]
We propose a simple yet effective baseline for coreference resolution. Our model is a simplified version of the original neural coreference resolution model. Our work provides evidence for the necessity of carefully justifying the complexity of existing or newly proposed models.
arXiv Detail & Related papers (2021-07-04T18:12:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.