PairReranker: Pairwise Reranking for Natural Language Generation
- URL: http://arxiv.org/abs/2212.10555v1
- Date: Tue, 20 Dec 2022 18:56:57 GMT
- Title: PairReranker: Pairwise Reranking for Natural Language Generation
- Authors: Dongfu Jiang, Bill Yuchen Lin, Xiang Ren
- Abstract summary: We show that selecting the best output from multiple decoding methods can significantly improve performance.
We propose a novel method, textscPairReranker, which uses a single encoder and a pairwise loss function to jointly encode a source input and a pair of candidates.
Experiments on three NLG tasks demonstrated the effectiveness and flexibility of textscPairReranker, showing strong results.
- Score: 33.73671362609599
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained language models have been successful in natural language
generation (NLG) tasks. While various decoding methods have been employed, they
often produce suboptimal results. We first present an empirical analysis of
three NLG tasks: summarization, machine translation, and constrained text
generation. We found that selecting the best output from the results of
multiple decoding methods can significantly improve performance. To further
improve reranking for NLG tasks, we proposed a novel method,
\textsc{PairReranker}, which uses a single encoder and a pairwise loss function
to jointly encode a source input and a pair of candidates and compare them.
Experiments on three NLG tasks demonstrated the effectiveness and flexibility
of \textsc{PairReranker}, showing strong results, compared with previous
baselines. In addition, our \textsc{PairReranker} can generalize to
significantly improve GPT-3 (text-davinci-003) results (e.g., 24.55\% on
CommonGen and 11.35\% on WMT18 zh-en), even though our rerankers are not
trained with any GPT-3 candidates.
Related papers
- When Retriever Meets Generator: A Joint Model for Code Comment Generation [3.6781644685120924]
RAGSum is built on top offuse retrieval and generation using a single CodeT5 backbone.<n>A contrastive pre-training phase shapes code embeddings for nearest-neighbor search.<n>A lightweight self-refinement loop is deployed to polish the final output.
arXiv Detail & Related papers (2025-07-16T18:12:27Z) - Graph-DPEP: Decomposed Plug and Ensemble Play for Few-Shot Document Relation Extraction with Graph-of-Thoughts Reasoning [34.85741925091139]
Graph-DPEP framework is grounded in the reasoning behind triplet explanation thoughts presented in natural language.
We develop "ensemble-play", reapplying generation on the entire type list by leveraging the reasoning thoughts embedded in a sub-graph.
arXiv Detail & Related papers (2024-11-05T07:12:36Z) - Stress Detection on Code-Mixed Texts in Dravidian Languages using Machine Learning [0.0]
Stress is a common feeling in daily life, but it can affect mental well-being in some situations.
This study introduces a methodical approach to the stress identification in code-mixed texts for Dravidian languages.
arXiv Detail & Related papers (2024-10-08T23:49:31Z) - Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping [60.458273797431836]
Decoding by contrasting layers (DoLa) is designed to improve the generation quality of large language models.
We find that this approach does not work well on non-English tasks.
Inspired by previous interpretability work on language transition during the model's forward pass, we propose an improved contrastive decoding algorithm.
arXiv Detail & Related papers (2024-07-15T15:14:01Z) - Reranking for Natural Language Generation from Logical Forms: A Study
based on Large Language Models [47.08364281023261]
Large language models (LLMs) have demonstrated impressive capabilities in natural language generation.
However, their output quality can be inconsistent, posing challenges for generating natural language from logical forms (LFs)
arXiv Detail & Related papers (2023-09-21T17:54:58Z) - GECTurk: Grammatical Error Correction and Detection Dataset for Turkish [1.804922416527064]
Grammatical Error Detection and Correction (GEC) tools have proven useful for native speakers and second language learners.
Synthetic data generation is a common practice to overcome the scarcity of such data.
We present a flexible and synthetic data generation pipeline for Turkish covering more than 20 expert-curated grammar and spelling rules.
arXiv Detail & Related papers (2023-09-20T14:25:44Z) - ReGen: Zero-Shot Text Classification via Training Data Generation with
Progressive Dense Retrieval [22.882301169283323]
We propose a retrieval-enhanced framework to create training data from a general-domain unlabeled corpus.
Experiments on nine datasets demonstrate that REGEN achieves 4.3% gain over the strongest baselines and saves around 70% of the time compared to baselines using large NLG models.
arXiv Detail & Related papers (2023-05-18T04:30:09Z) - LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback.
Our focus is the code generation task, where the model produces code based on natural language instructions.
LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep.
We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z) - Language Models are Few-Shot Learners [61.36677350504291]
We show that scaling up language models greatly improves task-agnostic, few-shot performance.
We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
arXiv Detail & Related papers (2020-05-28T17:29:03Z) - Knowledge Distillation for Multilingual Unsupervised Neural Machine
Translation [61.88012735215636]
Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs.
UNMT can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time.
In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder.
arXiv Detail & Related papers (2020-04-21T17:26:16Z) - TextGAIL: Generative Adversarial Imitation Learning for Text Generation [68.3579946817937]
We propose a generative adversarial imitation learning framework for text generation that uses large pre-trained language models to provide more reliable reward guidance.
Our approach uses contrastive discriminator, and proximal policy optimization (PPO) to stabilize and improve text generation performance.
arXiv Detail & Related papers (2020-04-07T00:24:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.