DiscoSum: Discourse-aware News Summarization
- URL: http://arxiv.org/abs/2506.06930v1
- Date: Sat, 07 Jun 2025 22:00:30 GMT
- Title: DiscoSum: Discourse-aware News Summarization
- Authors: Alexander Spangher, Tenghao Huang, Jialiang Gu, Jiatong Shi, Muhao Chen,
- Abstract summary: We introduce a novel approach to integrating discourse structure into summarization processes.<n>We present a novel summarization dataset where news articles are summarized multiple times in different ways across different social media platforms.<n>We develop a novel news discourse schema to describe summarization structures and a novel algorithm, DiscoSum, which employs beam search technique for structure-aware summarization.
- Score: 79.4884227574627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in text summarization have predominantly leveraged large language models to generate concise summaries. However, language models often do not maintain long-term discourse structure, especially in news articles, where organizational flow significantly influences reader engagement. We introduce a novel approach to integrating discourse structure into summarization processes, focusing specifically on news articles across various media. We present a novel summarization dataset where news articles are summarized multiple times in different ways across different social media platforms (e.g. LinkedIn, Facebook, etc.). We develop a novel news discourse schema to describe summarization structures and a novel algorithm, DiscoSum, which employs beam search technique for structure-aware summarization, enabling the transformation of news stories to meet different stylistic and structural demands. Both human and automatic evaluation results demonstrate the efficacy of our approach in maintaining narrative fidelity and meeting structural requirements.
Related papers
- Narrative Shift Detection: A Hybrid Approach of Dynamic Topic Models and Large Language Models [0.4649452333875421]
We propose a combination of the language understanding capabilities of Large Language Models with the large scale applicability of topic models to dynamically model narrative shifts across time.<n>We employ our pipeline on a corpus of The Wall Street Journal news paper articles from 2009 to 2023.
arXiv Detail & Related papers (2025-06-25T09:25:15Z) - Talking Point based Ideological Discourse Analysis in News Events [62.18747509565779]
We propose a framework motivated by the theory of ideological discourse analysis to analyze news articles related to real-world events.<n>Our framework represents the news articles using a relational structure - talking points, which captures the interaction between entities, their roles, and media frames along with a topic of discussion.<n>We evaluate our framework's ability to generate these perspectives through automated tasks - ideology and partisan classification tasks, supplemented by human validation.
arXiv Detail & Related papers (2025-04-10T02:52:34Z) - Mapping News Narratives Using LLMs and Narrative-Structured Text Embeddings [0.0]
We introduce a numerical narrative representation grounded in structuralist linguistic theory.
We extract the actants using an open-source LLM and integrate them into a Narrative-Structured Text Embedding.
We demonstrate the analytical insights of the method on the example of 5000 full-text news articles from Al Jazeera and The Washington Post on the Israel-Palestine conflict.
arXiv Detail & Related papers (2024-09-10T14:15:30Z) - Revisiting Conversation Discourse for Dialogue Disentanglement [88.3386821205896]
We propose enhancing dialogue disentanglement by taking full advantage of the dialogue discourse characteristics.
We develop a structure-aware framework to integrate the rich structural features for better modeling the conversational semantic context.
Our work has great potential to facilitate broader multi-party multi-thread dialogue applications.
arXiv Detail & Related papers (2023-06-06T19:17:47Z) - Conflicts, Villains, Resolutions: Towards models of Narrative Media
Framing [19.589945994234075]
We revisit a widely used conceptualization of framing from the communication sciences which explicitly captures elements of narratives.
We adapt an effective annotation paradigm that breaks a complex annotation task into a series of simpler binary questions.
We explore automatic multi-label prediction of our frames with supervised and semi-supervised approaches.
arXiv Detail & Related papers (2023-06-03T08:50:13Z) - Unsupervised Story Discovery from Continuous News Streams via Scalable
Thematic Embedding [37.62597275581973]
Unsupervised discovery of stories with correlated news articles in real-time helps people digest massive news streams without expensive human annotations.
We propose a novel thematic embedding with an off-the-shelf pretrained sentence encoder to dynamically represent articles and stories.
A thorough evaluation with real news data sets demonstrates that USTORY achieves higher story discovery performances than baselines.
arXiv Detail & Related papers (2023-04-08T20:41:15Z) - Generating Coherent Narratives by Learning Dynamic and Discrete Entity
States with a Contrastive Framework [68.1678127433077]
We extend the Transformer model to dynamically conduct entity state updates and sentence realization for narrative generation.
Experiments on two narrative datasets show that our model can generate more coherent and diverse narratives than strong baselines.
arXiv Detail & Related papers (2022-08-08T09:02:19Z) - Multi-View Sequence-to-Sequence Models with Conversational Structure for
Abstractive Dialogue Summarization [72.54873655114844]
Text summarization is one of the most challenging and interesting problems in NLP.
This work proposes a multi-view sequence-to-sequence model by first extracting conversational structures of unstructured daily chats from different views to represent conversations.
Experiments on a large-scale dialogue summarization corpus demonstrated that our methods significantly outperformed previous state-of-the-art models via both automatic evaluations and human judgment.
arXiv Detail & Related papers (2020-10-04T20:12:44Z) - CompRes: A Dataset for Narrative Structure in News [2.4578723416255754]
We introduce CompRes -- the first dataset for narrative structure in news media.
We use the annotated dataset to train several supervised models to identify the different narrative elements.
arXiv Detail & Related papers (2020-07-09T15:21:59Z) - Screenplay Summarization Using Latent Narrative Structure [78.45316339164133]
We propose to explicitly incorporate the underlying structure of narratives into general unsupervised and supervised extractive summarization models.
We formalize narrative structure in terms of key narrative events (turning points) and treat it as latent in order to summarize screenplays.
Experimental results on the CSI corpus of TV screenplays, which we augment with scene-level summarization labels, show that latent turning points correlate with important aspects of a CSI episode.
arXiv Detail & Related papers (2020-04-27T11:54:19Z) - The Shmoop Corpus: A Dataset of Stories with Loosely Aligned Summaries [72.48439126769627]
We introduce the Shmoop Corpus: a dataset of 231 stories paired with detailed multi-paragraph summaries for each individual chapter.
From the corpus, we construct a set of common NLP tasks, including Cloze-form question answering and a simplified form of abstractive summarization.
We believe that the unique structure of this corpus provides an important foothold towards making machine story comprehension more approachable.
arXiv Detail & Related papers (2019-12-30T21:03:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.