Related papers: Twist Decoding: Diverse Generators Guide Each Other

Twist Decoding: Diverse Generators Guide Each Other

URL: http://arxiv.org/abs/2205.09273v1
Date: Thu, 19 May 2022 01:27:53 GMT
Title: Twist Decoding: Diverse Generators Guide Each Other
Authors: Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Hao Peng, Ximing Lu, Dragomir Radev, Yejin Choi, Noah A. Smith
Abstract summary: We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models. Our method does not assume the vocabulary, tokenization or even generation order is shared.
Score: 116.20780037268801
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Natural language generation technology has recently seen remarkable progress with large-scale training, and many natural language applications are now built upon a wide range of generation models. Combining diverse models may lead to further progress, but conventional ensembling (e.g., shallow fusion) requires that they share vocabulary/tokenization schemes. We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models. Our method does not assume the vocabulary, tokenization or even generation order is shared. Our extensive evaluations on machine translation and scientific paper summarization demonstrate that Twist decoding substantially outperforms each model decoded in isolation over various scenarios, including cases where domain-specific and general-purpose models are both available. Twist decoding also consistently outperforms the popular reranking heuristic where output candidates from one model is rescored by another. We hope that our work will encourage researchers and practitioners to examine generation models collectively, not just independently, and to seek out models with complementary strengths to the currently available models.

Related papers

Token-level Ensembling of Models with Different Vocabularies [16.094010998574753]
Model ensembling is a technique to combine the predicted distributions of two or more models. This paper proposes an inference-time only algorithm that allows for ensembling models with different vocabularies.
arXiv Detail & Related papers (2025-02-28T17:41:27Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents. Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z)
Vector-Quantized Prompt Learning for Paraphrase Generation [18.40940464497253]
This paper proposes to generate diverse and high-quality paraphrases by exploiting the pre-trained models with instance-dependent prompts. Extensive experiments demonstrate that the proposed method achieves new state-of-art results on three benchmark datasets.
arXiv Detail & Related papers (2023-11-25T07:13:06Z)
Generative Pre-training for Speech with Flow Matching [81.59952572752248]
We pre-trained a generative model, named SpeechFlow, on 60k hours of untranscribed speech with Flow Matching and masked conditions. Experiment results show the pre-trained generative model can be fine-tuned with task-specific data to match or surpass existing expert models on speech enhancement, separation, and synthesis.
arXiv Detail & Related papers (2023-10-25T03:40:50Z)
Grafting Pre-trained Models for Multimodal Headline Generation [12.063053852096514]
Multimodal headline utilizes both video frames and transcripts to generate the natural language title of the videos. Previous researches on pre-trained language models and video-language models have achieved significant progress in related downstream tasks. We propose a novel approach to graft the video encoder from the pre-trained video-language model on the generative pre-trained language model.
arXiv Detail & Related papers (2022-11-14T08:59:59Z)
DIRECTOR: Generator-Classifiers For Supervised Language Modeling [27.86870968048833]
Current language models achieve low perplexity but their resulting generations still suffer from toxic responses, repetitiveness and contradictions. We introduce a new architecture, sc Director, that consists of a unified generator-classifier with both a language modeling and a classification head for each output token.
arXiv Detail & Related papers (2022-06-15T17:44:08Z)
Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models. A collection of pretrained encoders perceive diverse modalities (such as vision, and language) We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z)
Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages [86.08359401867577]
Back-translation is widely known for its effectiveness for neural machine translation when little to no parallel data is available. We propose performing back-translation via code summarization and generation. We show that our proposed approach performs competitively with state-of-the-art methods.
arXiv Detail & Related papers (2022-05-23T08:20:41Z)
Deep Latent-Variable Models for Text Generation [7.119436003155924]
Deep neural network-based end-to-end architectures have been widely adopted. End-to-end approach conflates all sub-modules, which used to be designed by complex handcrafted rules, into a holistic encode-decode architecture. This dissertation presents how deep latent-variable models can improve over the standard encoder-decoder model for text generation.
arXiv Detail & Related papers (2022-03-03T23:06:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.