Summary-Source Proposition-level Alignment: Task, Datasets and
Supervised Baseline
- URL: http://arxiv.org/abs/2009.00590v2
- Date: Wed, 22 Sep 2021 20:41:44 GMT
- Title: Summary-Source Proposition-level Alignment: Task, Datasets and
Supervised Baseline
- Authors: Ori Ernst, Ori Shapira, Ramakanth Pasunuru, Michael Lepioshkin, Jacob
Goldberger, Mohit Bansal, Ido Dagan
- Abstract summary: Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task.
We propose establishing summary-source alignment as an explicit task, while introducing two major novelties.
We create a novel training dataset for proposition-level alignment, derived automatically from available summarization evaluation data.
We present a supervised proposition alignment baseline model, showing improved alignment-quality over the unsupervised approach.
- Score: 94.0601799665342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aligning sentences in a reference summary with their counterparts in source
documents was shown as a useful auxiliary summarization task, notably for
generating training data for salience detection. Despite its assessed utility,
the alignment step was mostly approached with heuristic unsupervised methods,
typically ROUGE-based, and was never independently optimized or evaluated. In
this paper, we propose establishing summary-source alignment as an explicit
task, while introducing two major novelties: (1) applying it at the more
accurate proposition span level, and (2) approaching it as a supervised
classification task. To that end, we created a novel training dataset for
proposition-level alignment, derived automatically from available summarization
evaluation data. In addition, we crowdsourced dev and test datasets, enabling
model development and proper evaluation. Utilizing these data, we present a
supervised proposition alignment baseline model, showing improved
alignment-quality over the unsupervised approach.
Related papers
- The Power of Summary-Source Alignments [62.76959473193149]
Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection.
alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data.
This paper proposes extending the summary-source alignment framework by applying it at the more fine-grained proposition span level.
arXiv Detail & Related papers (2024-06-02T19:35:19Z) - GPT Self-Supervision for a Better Data Annotator [22.598300095822026]
We propose a Generative Pretrained Transformer (GPT) self-supervision annotation method.
The proposed approach comprises a one-shot tuning phase followed by a generation phase.
The alignment score between the recovered and original data serves as a self-supervision navigator to refine the process.
arXiv Detail & Related papers (2023-06-07T11:33:14Z) - Controlled Text Reduction [15.102190738450092]
We formalize textitControlled Text Reduction as a standalone task.
A model then needs to generate a coherent text that includes all and only the target information.
arXiv Detail & Related papers (2022-10-24T17:59:03Z) - Question-Based Salient Span Selection for More Controllable Text
Summarization [67.68208237480646]
We propose a method for incorporating question-answering (QA) signals into a summarization model.
Our method identifies salient noun phrases (NPs) in the input document by automatically generating wh-questions that are answered by the NPs.
This QA-based signal is incorporated into a two-stage summarization model which first marks salient NPs in the input document using a classification model, then conditionally generates a summary.
arXiv Detail & Related papers (2021-11-15T17:36:41Z) - A Single Example Can Improve Zero-Shot Data Generation [7.237231992155901]
Sub-tasks of intent classification require extensive and flexible datasets for experiments and evaluation.
We propose to use text generation methods to gather datasets.
We explore two approaches to generating task-oriented utterances.
arXiv Detail & Related papers (2021-08-16T09:43:26Z) - Centrality Meets Centroid: A Graph-based Approach for Unsupervised
Document Summarization [13.12794447731674]
We propose a graph-based unsupervised approach for extractive document summarization.
Our approach works at a summary-level by utilizing graph centrality and centroid.
arXiv Detail & Related papers (2021-03-29T04:35:33Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z) - BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift.
We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions.
Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z) - Proposal Learning for Semi-Supervised Object Detection [76.83284279733722]
It is non-trivial to train object detectors on unlabeled data due to the unavailability of ground truth labels.
We present a proposal learning approach to learn proposal features and predictions from both labeled and unlabeled data.
arXiv Detail & Related papers (2020-01-15T00:06:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.