Recursively Summarizing Books with Human Feedback
- URL: http://arxiv.org/abs/2109.10862v1
- Date: Wed, 22 Sep 2021 17:34:18 GMT
- Title: Recursively Summarizing Books with Human Feedback
- Authors: Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nissan Stiennon, Ryan Lowe,
Jan Leike, Paul Christiano
- Abstract summary: We present progress on the task of abstractive summarization of entire fiction novels.
We use models trained on smaller parts of the task to assist humans in giving feedback on the broader task.
We achieve state-of-the-art results on the recent BookSum dataset for book-length summarization.
- Score: 10.149048526411434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A major challenge for scaling machine learning is training models to perform
tasks that are very difficult or time-consuming for humans to evaluate. We
present progress on this problem on the task of abstractive summarization of
entire fiction novels. Our method combines learning from human feedback with
recursive task decomposition: we use models trained on smaller parts of the
task to assist humans in giving feedback on the broader task. We collect a
large volume of demonstrations and comparisons from human labelers, and
fine-tune GPT-3 using behavioral cloning and reward modeling to do
summarization recursively. At inference time, the model first summarizes small
sections of the book and then recursively summarizes these summaries to produce
a summary of the entire book. Our human labelers are able to supervise and
evaluate the models quickly, despite not having read the entire books
themselves. Our resulting model generates sensible summaries of entire books,
even matching the quality of human-written summaries in a few cases ($\sim5\%$
of books). We achieve state-of-the-art results on the recent BookSum dataset
for book-length summarization. A zero-shot question-answering model using these
summaries achieves state-of-the-art results on the challenging NarrativeQA
benchmark for answering questions about books and movie scripts. We release
datasets of samples from our model.
Related papers
- Assessment of Transformer-Based Encoder-Decoder Model for Human-Like Summarization [0.05852077003870416]
This work leverages transformer-based BART model for human-like summarization.
On training and fine-tuning the encoder-decoder model, it is tested with diverse sample articles.
The finetuned model performance is compared with the baseline pretrained model.
Empirical results on BBC News articles highlight that the gold standard summaries written by humans are more factually consistent by 17%.
arXiv Detail & Related papers (2024-10-22T09:25:04Z) - AugSumm: towards generalizable speech summarization using synthetic
labels from large language model [61.73741195292997]
Abstractive speech summarization (SSUM) aims to generate human-like summaries from speech.
conventional SSUM models are mostly trained and evaluated with a single ground-truth (GT) human-annotated deterministic summary.
We propose AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented summaries.
arXiv Detail & Related papers (2024-01-10T18:39:46Z) - UniSumm and SummZoo: Unified Model and Diverse Benchmark for Few-Shot
Summarization [54.59104881168188]
textscUniSumm is a unified few-shot summarization model pre-trained with multiple summarization tasks.
textscSummZoo is a new benchmark to better evaluate few-shot summarizers.
arXiv Detail & Related papers (2022-11-17T18:54:47Z) - Summarization with Graphical Elements [55.5913491389047]
We propose a new task: summarization with graphical elements.
We collect a high quality human labeled dataset to support research into the task.
arXiv Detail & Related papers (2022-04-15T17:16:41Z) - Leveraging Pretrained Models for Automatic Summarization of
Doctor-Patient Conversations [9.184616102949228]
We show that fluent and adequate summaries can be generated with limited training data by fine-tuning BART.
Using a carefully chosen fine-tuning dataset, this method is shown to be effective at handling longer conversations.
arXiv Detail & Related papers (2021-09-24T20:18:59Z) - SummerTime: Text Summarization Toolkit for Non-experts [23.041775425059985]
SummerTime is a complete toolkit for text summarization, including various models, datasets and evaluation metrics.
SummerTime integrates with libraries designed for NLP researchers, and enables users with easy-to-use APIs.
arXiv Detail & Related papers (2021-08-29T03:24:48Z) - Abstractive Query Focused Summarization with Query-Free Resources [60.468323530248945]
In this work, we consider the problem of leveraging only generic summarization resources to build an abstractive QFS system.
We propose Marge, a Masked ROUGE Regression framework composed of a novel unified representation for summaries and queries.
Despite learning from minimal supervision, our system achieves state-of-the-art results in the distantly supervised setting.
arXiv Detail & Related papers (2020-12-29T14:39:35Z) - Learning to summarize from human feedback [18.964548137315333]
We show that it is possible to significantly improve summary quality by training a model to optimize for human preferences.
We apply our method to a version of the TL;DR dataset of Reddit posts and find that our models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone.
Our models also transfer to CNN/DM news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning.
arXiv Detail & Related papers (2020-09-02T19:54:41Z) - Few-Shot Learning for Opinion Summarization [117.70510762845338]
Opinion summarization is the automatic creation of text reflecting subjective information expressed in multiple documents.
In this work, we show that even a handful of summaries is sufficient to bootstrap generation of the summary text.
Our approach substantially outperforms previous extractive and abstractive methods in automatic and human evaluation.
arXiv Detail & Related papers (2020-04-30T15:37:38Z) - Unsupervised Opinion Summarization with Noising and Denoising [85.49169453434554]
We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof.
At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise.
arXiv Detail & Related papers (2020-04-21T16:54:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.