Reranking for Natural Language Generation from Logical Forms: A Study
based on Large Language Models
- URL: http://arxiv.org/abs/2309.12294v1
- Date: Thu, 21 Sep 2023 17:54:58 GMT
- Title: Reranking for Natural Language Generation from Logical Forms: A Study
based on Large Language Models
- Authors: Levon Haroutunian, Zhuang Li, Lucian Galescu, Philip Cohen, Raj
Tumuluri, Gholamreza Haffari
- Abstract summary: Large language models (LLMs) have demonstrated impressive capabilities in natural language generation.
However, their output quality can be inconsistent, posing challenges for generating natural language from logical forms (LFs)
- Score: 47.08364281023261
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large language models (LLMs) have demonstrated impressive capabilities in
natural language generation. However, their output quality can be inconsistent,
posing challenges for generating natural language from logical forms (LFs).
This task requires the generated outputs to embody the exact semantics of LFs,
without missing any LF semantics or creating any hallucinations. In this work,
we tackle this issue by proposing a novel generate-and-rerank approach. Our
approach involves initially generating a set of candidate outputs by prompting
an LLM and subsequently reranking them using a task-specific reranker model. In
addition, we curate a manually collected dataset to evaluate the alignment
between different ranking metrics and human judgements. The chosen ranking
metrics are utilized to enhance the training and evaluation of the reranker
model. By conducting extensive experiments on three diverse datasets, we
demonstrate that the candidates selected by our reranker outperform those
selected by baseline methods in terms of semantic consistency and fluency, as
measured by three comprehensive metrics. Our findings provide strong evidence
for the effectiveness of our approach in improving the quality of generated
outputs.
Related papers
- Set-Based Prompting: Provably Solving the Language Model Order Dependency Problem [18.020492646988746]
We present Set-Based Prompting, a technique that guarantees the output of an LLM will not have order dependence on a specified set of sub-sequences.
Despite our inputs being out of distribution, the impact on expected accuracy is small, where the expectation is over the order of uniformly chosen shuffling of the candidate responses.
arXiv Detail & Related papers (2024-06-04T16:09:13Z) - Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [90.4820014819937]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions.
Our experimental results demonstrate that when finetuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, SELM significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Training Language Models with Language Feedback at Scale [50.70091340506957]
We introduce learning from Language Feedback (ILF), a new approach that utilizes more informative language feedback.
ILF consists of three steps that are applied iteratively: first, conditioning the language model on the input, an initial LM output, and feedback to generate refinements.
We show theoretically that ILF can be viewed as Bayesian Inference, similar to Reinforcement Learning from human feedback.
arXiv Detail & Related papers (2023-03-28T17:04:15Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.