Related papers: Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation

Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation

URL: http://arxiv.org/abs/2007.07841v1
Date: Wed, 15 Jul 2020 17:03:34 GMT
Title: Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation
Authors: Paul Tardy, David Janiszek, Yannick Est\`eve, Vincent Nguyen
Abstract summary: State-of-the-art on automatic text summarization mostly revolves around news articles. Our work consists in segmenting and aligning transcriptions with respect to reports, to get a suitable dataset for neural summarization. We report automatic alignment and summarization performances on a novel corpus of aligned public meetings.
Score: 8.029049649310211
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Summarizing texts is not a straightforward task. Before even considering text summarization, one should determine what kind of summary is expected. How much should the information be compressed? Is it relevant to reformulate or should the summary stick to the original phrasing? State-of-the-art on automatic text summarization mostly revolves around news articles. We suggest that considering a wider variety of tasks would lead to an improvement in the field, in terms of generalization and robustness. We explore meeting summarization: generating reports from automatic transcriptions. Our work consists in segmenting and aligning transcriptions with respect to reports, to get a suitable dataset for neural summarization. Using a bootstrapping approach, we provide pre-alignments that are corrected by human annotators, making a validation set against which we evaluate automatic models. This consistently reduces annotators' efforts by providing iteratively better pre-alignment and maximizes the corpus size by using annotations from our automatic alignment models. Evaluation is conducted on \publicmeetings, a novel corpus of aligned public meetings. We report automatic alignment and summarization performances on this corpus and show that automatic alignment is relevant for data annotation since it leads to large improvement of almost +4 on all ROUGE scores on the summarization task.

Related papers

Context-Aware Hierarchical Merging for Long Document Summarization [56.96619074316232]
We propose different approaches to enrich hierarchical merging with context from the source document. Experimental results on datasets representing legal and narrative domains show that contextual augmentation consistently outperforms zero-shot and hierarchical merging baselines.
arXiv Detail & Related papers (2025-02-03T01:14:31Z)
Incremental Extractive Opinion Summarization Using Cover Trees [81.59625423421355]
In online marketplaces user reviews accumulate over time, and opinion summaries need to be updated periodically. In this work, we study the task of extractive opinion summarization in an incremental setting. We present an efficient algorithm for accurately computing the CentroidRank summaries in an incremental setting.
arXiv Detail & Related papers (2024-01-16T02:00:17Z)
On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries. Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens. We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z)
Controlled Text Reduction [15.102190738450092]
We formalize textitControlled Text Reduction as a standalone task. A model then needs to generate a coherent text that includes all and only the target information.
arXiv Detail & Related papers (2022-10-24T17:59:03Z)
Text Summarization with Oracle Expectation [88.39032981994535]
Extractive summarization produces summaries by identifying and concatenating the most important sentences in a document. Most summarization datasets do not come with gold labels indicating whether document sentences are summary-worthy. We propose a simple yet effective labeling algorithm that creates soft, expectation-based sentence labels.
arXiv Detail & Related papers (2022-09-26T14:10:08Z)
A General Contextualized Rewriting Framework for Text Summarization [15.311467109946571]
Exiting rewriting systems take each extractive sentence as the only input, which is relatively focused but can lose necessary background knowledge and discourse context. We formalize contextualized rewriting as a seq2seq with group-tag alignments, identifying extractive sentences through content-based addressing. Results show that our approach significantly outperforms non-contextualized rewriting systems without requiring reinforcement learning.
arXiv Detail & Related papers (2022-07-13T03:55:57Z)
SNaC: Coherence Error Detection for Narrative Summarization [73.48220043216087]
We introduce SNaC, a narrative coherence evaluation framework rooted in fine-grained annotations for long summaries. We develop a taxonomy of coherence errors in generated narrative summaries and collect span-level annotations for 6.6k sentences across 150 book and movie screenplay summaries. Our work provides the first characterization of coherence errors generated by state-of-the-art summarization models and a protocol for eliciting coherence judgments from crowd annotators.
arXiv Detail & Related papers (2022-05-19T16:01:47Z)
Efficient Few-Shot Fine-Tuning for Opinion Summarization [83.76460801568092]
Abstractive summarization models are typically pre-trained on large amounts of generic texts, then fine-tuned on tens or hundreds of thousands of annotated samples. We show that a few-shot method based on adapters can easily store in-domain knowledge. We show that this self-supervised adapter pre-training improves summary quality over standard fine-tuning by 2.0 and 1.3 ROUGE-L points on the Amazon and Yelp datasets.
arXiv Detail & Related papers (2022-05-04T16:38:37Z)
Automated News Summarization Using Transformers [4.932130498861987]
We will be presenting a comprehensive comparison of a few transformer architecture based pre-trained models for text summarization. For analysis and comparison, we have used the BBC news dataset that contains text data that can be used for summarization and human generated summaries.
arXiv Detail & Related papers (2021-04-23T04:22:33Z)
Leverage Unlabeled Data for Abstractive Speech Summarization with Self-Supervised Learning and Back-Summarization [6.465251961564605]
Supervised approaches for Neural Abstractive Summarization require large annotated corpora that are costly to build. We present a French meeting summarization task where reports are predicted based on the automatic transcription of the meeting audio recordings. We report large improvements compared to the previous baseline for both approaches on two evaluation sets.
arXiv Detail & Related papers (2020-07-30T08:22:47Z)
Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems. We formulate the extractive summarization task as a semantic text matching problem. We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.