Related papers: End-to-End Segmentation-based News Summarization

End-to-End Segmentation-based News Summarization

URL: http://arxiv.org/abs/2110.07850v1
Date: Fri, 15 Oct 2021 04:17:26 GMT
Title: End-to-End Segmentation-based News Summarization
Authors: Yang Liu, Chenguang Zhu, Michael Zeng
Abstract summary: We introduce the task of segmenting a news article into multiple sections and generating the corresponding summary to each section. First, we create and make available a dataset, SegNews, consisting of 27k news articles with sections and aligned heading-style section summaries. Second, we propose a novel segmentation-based language generation model adapted from pre-trained language models.
Score: 15.549631631269198
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we bring a new way of digesting news content by introducing the task of segmenting a news article into multiple sections and generating the corresponding summary to each section. We make two contributions towards this new task. First, we create and make available a dataset, SegNews, consisting of 27k news articles with sections and aligned heading-style section summaries. Second, we propose a novel segmentation-based language generation model adapted from pre-trained language models that can jointly segment a document and produce the summary for each section. Experimental results on SegNews demonstrate that our model can outperform several state-of-the-art sequence-to-sequence generation models for this new task.

Related papers

DiscoSum: Discourse-aware News Summarization [79.4884227574627]
We introduce a novel approach to integrating discourse structure into summarization processes.<n>We present a novel summarization dataset where news articles are summarized multiple times in different ways across different social media platforms.<n>We develop a novel news discourse schema to describe summarization structures and a novel algorithm, DiscoSum, which employs beam search technique for structure-aware summarization.
arXiv Detail & Related papers (2025-06-07T22:00:30Z)
Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization [48.57273563299046]
We propose the task of Stepwise Summarization, which aims to generate a new appended summary each time a new document is proposed. The appended summary should not only summarize the newly added content but also be coherent with the previous summary. We show that SSG achieves state-of-the-art performance in terms of both automatic metrics and human evaluations.
arXiv Detail & Related papers (2024-06-08T05:37:26Z)
From Text Segmentation to Smart Chaptering: A Novel Benchmark for Structuring Video Transcriptions [63.11097464396147]
We introduce a novel benchmark YTSeg focusing on spoken content that is inherently more unstructured and both topically and structurally diverse. We also introduce an efficient hierarchical segmentation model MiniSeg, that outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-27T15:59:37Z)
Universal Segmentation at Arbitrary Granularity with Language Instruction [59.76130089644841]
We present UniLSeg, a universal segmentation model that can perform segmentation at any semantic level with the guidance of language instructions. For training UniLSeg, we reorganize a group of tasks from original diverse distributions into a unified data format, where images with texts describing segmentation targets as input and corresponding masks are output.
arXiv Detail & Related papers (2023-12-04T04:47:48Z)
VideoXum: Cross-modal Visual and Textural Summarization of Videos [54.0985975755278]
We propose a new joint video and text summarization task. The goal is to generate both a shortened video clip along with the corresponding textual summary from a long video. The generated shortened video clip and text narratives should be semantically well aligned.
arXiv Detail & Related papers (2023-03-21T17:51:23Z)
NEWTS: A Corpus for News Topic-Focused Summarization [9.872518517174498]
This paper introduces the first topical summarization corpus, based on the well-known CNN/Dailymail dataset. We evaluate a range of existing techniques and analyze the effectiveness of different prompting methods.
arXiv Detail & Related papers (2022-05-31T10:01:38Z)
Unsupervised Summarization with Customized Granularities [76.26899748972423]
We propose the first unsupervised multi-granularity summarization framework, GranuSum. By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner.
arXiv Detail & Related papers (2022-01-29T05:56:35Z)
Part-aware Prototype Network for Few-shot Semantic Segmentation [50.581647306020095]
We propose a novel few-shot semantic segmentation framework based on the prototype representation. Our key idea is to decompose the holistic class representation into a set of part-aware prototypes. We develop a novel graph neural network model to generate and enhance the proposed part-aware prototypes.
arXiv Detail & Related papers (2020-07-13T11:03:09Z)
Generating Representative Headlines for News Stories [31.67864779497127]
Grouping articles that are reporting the same event into news stories is a common way of assisting readers in their news consumption. It remains a challenging research problem to efficiently and effectively generate a representative headline for each story. We develop a distant supervision approach to train large-scale generation models without any human annotation.
arXiv Detail & Related papers (2020-01-26T02:08:22Z)
Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation [9.416757363901295]
We introduce a novel supervised model for text segmentation with simple but explicit coherence modeling. Our model -- a neural architecture consisting of two hierarchically connected Transformer networks -- is a multi-task learning model that couples the sentence-level segmentation objective with the coherence objective that differentiates correct sequences of sentences from corrupt ones.
arXiv Detail & Related papers (2020-01-03T17:06:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.