Twist Decoding: Diverse Generators Guide Each Other
- URL: http://arxiv.org/abs/2205.09273v1
- Date: Thu, 19 May 2022 01:27:53 GMT
- Title: Twist Decoding: Diverse Generators Guide Each Other
- Authors: Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Hao Peng, Ximing Lu,
Dragomir Radev, Yejin Choi, Noah A. Smith
- Abstract summary: We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models.
Our method does not assume the vocabulary, tokenization or even generation order is shared.
- Score: 116.20780037268801
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural language generation technology has recently seen remarkable progress
with large-scale training, and many natural language applications are now built
upon a wide range of generation models. Combining diverse models may lead to
further progress, but conventional ensembling (e.g., shallow fusion) requires
that they share vocabulary/tokenization schemes. We introduce Twist decoding, a
simple and general inference algorithm that generates text while benefiting
from diverse models. Our method does not assume the vocabulary, tokenization or
even generation order is shared. Our extensive evaluations on machine
translation and scientific paper summarization demonstrate that Twist decoding
substantially outperforms each model decoded in isolation over various
scenarios, including cases where domain-specific and general-purpose models are
both available. Twist decoding also consistently outperforms the popular
reranking heuristic where output candidates from one model is rescored by
another. We hope that our work will encourage researchers and practitioners to
examine generation models collectively, not just independently, and to seek out
models with complementary strengths to the currently available models.
Related papers
- Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Vector-Quantized Prompt Learning for Paraphrase Generation [18.40940464497253]
This paper proposes to generate diverse and high-quality paraphrases by exploiting the pre-trained models with instance-dependent prompts.
Extensive experiments demonstrate that the proposed method achieves new state-of-art results on three benchmark datasets.
arXiv Detail & Related papers (2023-11-25T07:13:06Z) - Generative Pre-training for Speech with Flow Matching [81.59952572752248]
We pre-trained a generative model, named SpeechFlow, on 60k hours of untranscribed speech with Flow Matching and masked conditions.
Experiment results show the pre-trained generative model can be fine-tuned with task-specific data to match or surpass existing expert models on speech enhancement, separation, and synthesis.
arXiv Detail & Related papers (2023-10-25T03:40:50Z) - Grafting Pre-trained Models for Multimodal Headline Generation [12.063053852096514]
Multimodal headline utilizes both video frames and transcripts to generate the natural language title of the videos.
Previous researches on pre-trained language models and video-language models have achieved significant progress in related downstream tasks.
We propose a novel approach to graft the video encoder from the pre-trained video-language model on the generative pre-trained language model.
arXiv Detail & Related papers (2022-11-14T08:59:59Z) - DIRECTOR: Generator-Classifiers For Supervised Language Modeling [27.86870968048833]
Current language models achieve low perplexity but their resulting generations still suffer from toxic responses, repetitiveness and contradictions.
We introduce a new architecture, sc Director, that consists of a unified generator-classifier with both a language modeling and a classification head for each output token.
arXiv Detail & Related papers (2022-06-15T17:44:08Z) - Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models.
A collection of pretrained encoders perceive diverse modalities (such as vision, and language)
We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z) - Summarize and Generate to Back-translate: Unsupervised Translation of
Programming Languages [86.08359401867577]
Back-translation is widely known for its effectiveness for neural machine translation when little to no parallel data is available.
We propose performing back-translation via code summarization and generation.
We show that our proposed approach performs competitively with state-of-the-art methods.
arXiv Detail & Related papers (2022-05-23T08:20:41Z) - Deep Latent-Variable Models for Text Generation [7.119436003155924]
Deep neural network-based end-to-end architectures have been widely adopted.
End-to-end approach conflates all sub-modules, which used to be designed by complex handcrafted rules, into a holistic encode-decode architecture.
This dissertation presents how deep latent-variable models can improve over the standard encoder-decoder model for text generation.
arXiv Detail & Related papers (2022-03-03T23:06:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.