Reranking for Natural Language Generation from Logical Forms: A Study
based on Large Language Models
- URL: http://arxiv.org/abs/2309.12294v1
- Date: Thu, 21 Sep 2023 17:54:58 GMT
- Title: Reranking for Natural Language Generation from Logical Forms: A Study
based on Large Language Models
- Authors: Levon Haroutunian, Zhuang Li, Lucian Galescu, Philip Cohen, Raj
Tumuluri, Gholamreza Haffari
- Abstract summary: Large language models (LLMs) have demonstrated impressive capabilities in natural language generation.
However, their output quality can be inconsistent, posing challenges for generating natural language from logical forms (LFs)
- Score: 47.08364281023261
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large language models (LLMs) have demonstrated impressive capabilities in
natural language generation. However, their output quality can be inconsistent,
posing challenges for generating natural language from logical forms (LFs).
This task requires the generated outputs to embody the exact semantics of LFs,
without missing any LF semantics or creating any hallucinations. In this work,
we tackle this issue by proposing a novel generate-and-rerank approach. Our
approach involves initially generating a set of candidate outputs by prompting
an LLM and subsequently reranking them using a task-specific reranker model. In
addition, we curate a manually collected dataset to evaluate the alignment
between different ranking metrics and human judgements. The chosen ranking
metrics are utilized to enhance the training and evaluation of the reranker
model. By conducting extensive experiments on three diverse datasets, we
demonstrate that the candidates selected by our reranker outperform those
selected by baseline methods in terms of semantic consistency and fluency, as
measured by three comprehensive metrics. Our findings provide strong evidence
for the effectiveness of our approach in improving the quality of generated
outputs.
Related papers
- Graph-DPEP: Decomposed Plug and Ensemble Play for Few-Shot Document Relation Extraction with Graph-of-Thoughts Reasoning [34.85741925091139]
Graph-DPEP framework is grounded in the reasoning behind triplet explanation thoughts presented in natural language.
We develop "ensemble-play", reapplying generation on the entire type list by leveraging the reasoning thoughts embedded in a sub-graph.
arXiv Detail & Related papers (2024-11-05T07:12:36Z) - Investigating a Benchmark for Training-set free Evaluation of Linguistic Capabilities in Machine Reading Comprehension [12.09297288867446]
We examine a framework for evaluating optimised models in training-set free setting on synthetically generated challenge sets.
We find that despite the simplicity of the generation method, the data can compete with crowd-sourced datasets with regard to naturalness and lexical diversity.
We conduct further experiments and show that state-of-the-art language model-based MRC systems can learn to succeed on the challenge set correctly.
arXiv Detail & Related papers (2024-08-09T12:23:36Z) - Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [88.56809269990625]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions.
Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, Self-Exploring Language Models (SELM) significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Training Language Models with Language Feedback at Scale [50.70091340506957]
We introduce learning from Language Feedback (ILF), a new approach that utilizes more informative language feedback.
ILF consists of three steps that are applied iteratively: first, conditioning the language model on the input, an initial LM output, and feedback to generate refinements.
We show theoretically that ILF can be viewed as Bayesian Inference, similar to Reinforcement Learning from human feedback.
arXiv Detail & Related papers (2023-03-28T17:04:15Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.