Controlled Text Reduction
- URL: http://arxiv.org/abs/2210.13449v1
- Date: Mon, 24 Oct 2022 17:59:03 GMT
- Title: Controlled Text Reduction
- Authors: Aviv Slobodkin, Paul Roit, Eran Hirsch, Ori Ernst, Ido Dagan
- Abstract summary: We formalize textitControlled Text Reduction as a standalone task.
A model then needs to generate a coherent text that includes all and only the target information.
- Score: 15.102190738450092
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Producing a reduced version of a source text, as in generic or focused
summarization, inherently involves two distinct subtasks: deciding on targeted
content and generating a coherent text conveying it. While some popular
approaches address summarization as a single end-to-end task, prominent works
support decomposed modeling for individual subtasks. Further, semi-automated
text reduction is also very appealing, where users may identify targeted
content while models would generate a corresponding coherent summary.
In this paper, we focus on the second subtask, of generating coherent text
given pre-selected content. Concretely, we formalize \textit{Controlled Text
Reduction} as a standalone task, whose input is a source text with marked spans
of targeted content ("highlighting"). A model then needs to generate a coherent
text that includes all and only the target information. We advocate the
potential of such models, both for modular fully-automatic summarization, as
well as for semi-automated human-in-the-loop use cases. Facilitating proper
research, we crowdsource high-quality dev and test datasets for the task.
Further, we automatically generate a larger "silver" training dataset from
available summarization benchmarks, leveraging a pretrained summary-source
alignment model. Finally, employing these datasets, we present a supervised
baseline model, showing promising results and insightful analyses.
Related papers
- Personalized Video Summarization using Text-Based Queries and Conditional Modeling [3.4447129363520337]
This thesis explores enhancing video summarization by integrating text-based queries and conditional modeling.
Evaluation metrics such as accuracy and F1-score assess the quality of the generated summaries.
arXiv Detail & Related papers (2024-08-27T02:43:40Z) - The Power of Summary-Source Alignments [62.76959473193149]
Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection.
alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data.
This paper proposes extending the summary-source alignment framework by applying it at the more fine-grained proposition span level.
arXiv Detail & Related papers (2024-06-02T19:35:19Z) - NEWTS: A Corpus for News Topic-Focused Summarization [9.872518517174498]
This paper introduces the first topical summarization corpus, based on the well-known CNN/Dailymail dataset.
We evaluate a range of existing techniques and analyze the effectiveness of different prompting methods.
arXiv Detail & Related papers (2022-05-31T10:01:38Z) - Summarization with Graphical Elements [55.5913491389047]
We propose a new task: summarization with graphical elements.
We collect a high quality human labeled dataset to support research into the task.
arXiv Detail & Related papers (2022-04-15T17:16:41Z) - Topic Modeling Based Extractive Text Summarization [0.0]
We propose a novel method to summarize a text document by clustering its contents based on latent topics.
We utilize the lesser used and challenging WikiHow dataset in our approach to text summarization.
arXiv Detail & Related papers (2021-06-29T12:28:19Z) - Automated News Summarization Using Transformers [4.932130498861987]
We will be presenting a comprehensive comparison of a few transformer architecture based pre-trained models for text summarization.
For analysis and comparison, we have used the BBC news dataset that contains text data that can be used for summarization and human generated summaries.
arXiv Detail & Related papers (2021-04-23T04:22:33Z) - Summary-Source Proposition-level Alignment: Task, Datasets and
Supervised Baseline [94.0601799665342]
Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task.
We propose establishing summary-source alignment as an explicit task, while introducing two major novelties.
We create a novel training dataset for proposition-level alignment, derived automatically from available summarization evaluation data.
We present a supervised proposition alignment baseline model, showing improved alignment-quality over the unsupervised approach.
arXiv Detail & Related papers (2020-09-01T17:27:12Z) - Few-Shot Learning for Opinion Summarization [117.70510762845338]
Opinion summarization is the automatic creation of text reflecting subjective information expressed in multiple documents.
In this work, we show that even a handful of summaries is sufficient to bootstrap generation of the summary text.
Our approach substantially outperforms previous extractive and abstractive methods in automatic and human evaluation.
arXiv Detail & Related papers (2020-04-30T15:37:38Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z) - Pre-training for Abstractive Document Summarization by Reinstating
Source Text [105.77348528847337]
This paper presents three pre-training objectives which allow us to pre-train a Seq2Seq based abstractive summarization model on unlabeled text.
Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines.
arXiv Detail & Related papers (2020-04-04T05:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.