On Improving Summarization Factual Consistency from Natural Language
Feedback
- URL: http://arxiv.org/abs/2212.09968v2
- Date: Mon, 16 Oct 2023 04:31:22 GMT
- Title: On Improving Summarization Factual Consistency from Natural Language
Feedback
- Authors: Yixin Liu, Budhaditya Deb, Milagro Teruel, Aaron Halfaker, Dragomir
Radev, Ahmed H. Awadallah
- Abstract summary: We study whether informational feedback in natural language can be leveraged to improve generation quality and user preference alignment.
We collect a high-quality dataset, DeFacto, containing human demonstrations and informational natural language feedback.
We show that DeFacto can provide factually consistent human-edited summaries.
- Score: 35.03102318835244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the recent progress in language generation models, their outputs may
not always meet user expectations. In this work, we study whether informational
feedback in natural language can be leveraged to improve generation quality and
user preference alignment. To this end, we consider factual consistency in
summarization, the quality that the summary should only contain information
supported by the input documents, as the user-expected preference. We collect a
high-quality dataset, DeFacto, containing human demonstrations and
informational natural language feedback consisting of corrective instructions,
edited summaries, and explanations with respect to the factual consistency of
the summary. Using our dataset, we study three natural language generation
tasks: (1) editing a summary by following the human feedback, (2) generating
human feedback for editing the original summary, and (3) revising the initial
summary to correct factual errors by generating both the human feedback and
edited summary. We show that DeFacto can provide factually consistent
human-edited summaries and further insights into summarization factual
consistency thanks to its informational natural language feedback. We further
demonstrate that fine-tuned language models can leverage our dataset to improve
the summary factual consistency, while large language models lack the zero-shot
learning ability in our proposed tasks that require controllable text
generation.
Related papers
- Factually Consistent Summarization via Reinforcement Learning with
Textual Entailment Feedback [57.816210168909286]
We leverage recent progress on textual entailment models to address this problem for abstractive summarization systems.
We use reinforcement learning with reference-free, textual entailment rewards to optimize for factual consistency.
Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience, and conciseness of the generated summaries.
arXiv Detail & Related papers (2023-05-31T21:04:04Z) - Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
Language Generation [68.9440575276396]
This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation.
First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization.
Second, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models.
Third, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for
arXiv Detail & Related papers (2023-05-01T17:36:06Z) - Training Language Models with Language Feedback at Scale [50.70091340506957]
We introduce learning from Language Feedback (ILF), a new approach that utilizes more informative language feedback.
ILF consists of three steps that are applied iteratively: first, conditioning the language model on the input, an initial LM output, and feedback to generate refinements.
We show theoretically that ILF can be viewed as Bayesian Inference, similar to Reinforcement Learning from human feedback.
arXiv Detail & Related papers (2023-03-28T17:04:15Z) - Towards Improving Faithfulness in Abstractive Summarization [37.19777407790153]
We propose a Faithfulness Enhanced Summarization model (FES) to improve fidelity in abstractive summarization.
Our model outperforms strong baselines in experiments on CNN/DM and XSum.
arXiv Detail & Related papers (2022-10-04T19:52:09Z) - Training Language Models with Natural Language Feedback [51.36137482891037]
We learn from language feedback on model outputs using a three-step learning algorithm.
In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements.
Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization.
arXiv Detail & Related papers (2022-04-29T15:06:58Z) - Fine-tuning GPT-3 for Russian Text Summarization [77.34726150561087]
This paper showcases ruGPT3 ability to summarize texts, fine-tuning it on the corpora of Russian news with their corresponding human-generated summaries.
We evaluate the resulting texts with a set of metrics, showing that our solution can surpass the state-of-the-art model's performance without additional changes in architecture or loss function.
arXiv Detail & Related papers (2021-08-07T19:01:40Z) - Few-Shot Learning for Opinion Summarization [117.70510762845338]
Opinion summarization is the automatic creation of text reflecting subjective information expressed in multiple documents.
In this work, we show that even a handful of summaries is sufficient to bootstrap generation of the summary text.
Our approach substantially outperforms previous extractive and abstractive methods in automatic and human evaluation.
arXiv Detail & Related papers (2020-04-30T15:37:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.