Question-Based Salient Span Selection for More Controllable Text
Summarization
- URL: http://arxiv.org/abs/2111.07935v1
- Date: Mon, 15 Nov 2021 17:36:41 GMT
- Title: Question-Based Salient Span Selection for More Controllable Text
Summarization
- Authors: Daniel Deutsch and Dan Roth
- Abstract summary: We propose a method for incorporating question-answering (QA) signals into a summarization model.
Our method identifies salient noun phrases (NPs) in the input document by automatically generating wh-questions that are answered by the NPs.
This QA-based signal is incorporated into a two-stage summarization model which first marks salient NPs in the input document using a classification model, then conditionally generates a summary.
- Score: 67.68208237480646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose a method for incorporating question-answering (QA)
signals into a summarization model. Our method identifies salient noun phrases
(NPs) in the input document by automatically generating wh-questions that are
answered by the NPs and automatically determining whether those questions are
answered in the gold summaries. This QA-based signal is incorporated into a
two-stage summarization model which first marks salient NPs in the input
document using a classification model, then conditionally generates a summary.
Our experiments demonstrate that the models trained using QA-based supervision
generate higher-quality summaries than baseline methods of identifying salient
spans on benchmark summarization datasets. Further, we show that the content of
the generated summaries can be controlled based on which NPs are marked in the
input document. Finally, we propose a method of augmenting the training data so
the gold summaries are more consistent with the marked input spans used during
training and show how this results in models which learn to better exclude
unmarked document content.
Related papers
- Automatically Summarizing Evidence from Clinical Trials: A Prototype
Highlighting Current Challenges [20.74608114488094]
TrialsSummarizer aims to automatically summarize evidence presented in the set of randomized controlled trials most relevant to a given query.
System retrieves trial publications matching a query specifying a combination of condition, intervention(s), and outcome(s)
Top-k such studies are passed through a neural multi-document summarization system, yielding a synopsis of these trials.
arXiv Detail & Related papers (2023-03-07T17:30:48Z) - Socratic Pretraining: Question-Driven Pretraining for Controllable
Summarization [89.04537372465612]
Socratic pretraining is a question-driven, unsupervised pretraining objective designed to improve controllability in summarization tasks.
Our results show that Socratic pretraining cuts task-specific labeled data requirements in half.
arXiv Detail & Related papers (2022-12-20T17:27:10Z) - Tokenization Consistency Matters for Generative Models on Extractive NLP
Tasks [54.306234256074255]
We identify the issue of tokenization inconsistency that is commonly neglected in training generative models.
This issue damages the extractive nature of these tasks after the input and output are tokenized inconsistently.
We show that, with consistent tokenization, the model performs better in both in-domain and out-of-domain datasets.
arXiv Detail & Related papers (2022-12-19T23:33:21Z) - Questions Are All You Need to Train a Dense Passage Retriever [123.13872383489172]
ART is a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data.
It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question.
arXiv Detail & Related papers (2022-06-21T18:16:31Z) - Meeting Summarization with Pre-training and Clustering Methods [6.47783315109491]
HMNetcitehmnet is a hierarchical network that employs both a word-level transformer and a turn-level transformer, as the baseline.
We extend the locate-then-summarize approach of QMSumciteqmsum with an intermediate clustering step.
We compare the performance of our baseline models with BART, a state-of-the-art language model that is effective for summarization.
arXiv Detail & Related papers (2021-11-16T03:14:40Z) - Generating Self-Contained and Summary-Centric Question Answer Pairs via
Differentiable Reward Imitation Learning [7.2745835227138045]
We propose a model for generating question-answer pairs (QA pairs) with self-contained, summary-centric questions and length-constrained, article-summarizing answers.
This dataset is used to learn a QA pair generation model producing summaries as answers that balance brevity with sufficiency jointly with their corresponding questions.
arXiv Detail & Related papers (2021-09-10T06:34:55Z) - Abstractive Query Focused Summarization with Query-Free Resources [60.468323530248945]
In this work, we consider the problem of leveraging only generic summarization resources to build an abstractive QFS system.
We propose Marge, a Masked ROUGE Regression framework composed of a novel unified representation for summaries and queries.
Despite learning from minimal supervision, our system achieves state-of-the-art results in the distantly supervised setting.
arXiv Detail & Related papers (2020-12-29T14:39:35Z) - Summary-Source Proposition-level Alignment: Task, Datasets and
Supervised Baseline [94.0601799665342]
Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task.
We propose establishing summary-source alignment as an explicit task, while introducing two major novelties.
We create a novel training dataset for proposition-level alignment, derived automatically from available summarization evaluation data.
We present a supervised proposition alignment baseline model, showing improved alignment-quality over the unsupervised approach.
arXiv Detail & Related papers (2020-09-01T17:27:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.