Unsupervised Story Discovery from Continuous News Streams via Scalable
Thematic Embedding
- URL: http://arxiv.org/abs/2304.04099v3
- Date: Thu, 4 May 2023 04:36:23 GMT
- Title: Unsupervised Story Discovery from Continuous News Streams via Scalable
Thematic Embedding
- Authors: Susik Yoon, Dongha Lee, Yunyi Zhang, Jiawei Han
- Abstract summary: Unsupervised discovery of stories with correlated news articles in real-time helps people digest massive news streams without expensive human annotations.
We propose a novel thematic embedding with an off-the-shelf pretrained sentence encoder to dynamically represent articles and stories.
A thorough evaluation with real news data sets demonstrates that USTORY achieves higher story discovery performances than baselines.
- Score: 37.62597275581973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised discovery of stories with correlated news articles in real-time
helps people digest massive news streams without expensive human annotations. A
common approach of the existing studies for unsupervised online story discovery
is to represent news articles with symbolic- or graph-based embedding and
incrementally cluster them into stories. Recent large language models are
expected to improve the embedding further, but a straightforward adoption of
the models by indiscriminately encoding all information in articles is
ineffective to deal with text-rich and evolving news streams. In this work, we
propose a novel thematic embedding with an off-the-shelf pretrained sentence
encoder to dynamically represent articles and stories by considering their
shared temporal themes. To realize the idea for unsupervised online story
discovery, a scalable framework USTORY is introduced with two main techniques,
theme- and time-aware dynamic embedding and novelty-aware adaptive clustering,
fueled by lightweight story summaries. A thorough evaluation with real news
data sets demonstrates that USTORY achieves higher story discovery performances
than baselines while being robust and scalable to various streaming settings.
Related papers
- Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models [79.21968152209193]
We introduce the NewEpisode benchmark to evaluate generative models' adaptability in generating new stories with fresh characters.
We propose EpicEvo, a method that customizes a diffusion-based visual story generation model with a single story featuring the new characters seamlessly integrating them into established character dynamics.
arXiv Detail & Related papers (2024-05-20T07:54:03Z) - Improving Sequence-to-Sequence Models for Abstractive Text Summarization Using Meta Heuristic Approaches [0.0]
Humans have a unique ability to create abstractions.
The use of sequence-to-sequence (seq2seq) models for neural abstractive text summarization has been ascending as far as prevalence.
In this article, we aim toward enhancing the present architectures and models for abstractive text summarization.
arXiv Detail & Related papers (2024-03-24T17:39:36Z) - SCStory: Self-supervised and Continual Online Story Discovery [53.72745249384159]
SCStory helps people digest rapidly published news article streams in real-time without human annotations.
SCStory employs self-supervised and continual learning with a novel idea of story-indicative adaptive modeling of news article streams.
arXiv Detail & Related papers (2023-11-27T04:50:01Z) - Conflicts, Villains, Resolutions: Towards models of Narrative Media
Framing [19.589945994234075]
We revisit a widely used conceptualization of framing from the communication sciences which explicitly captures elements of narratives.
We adapt an effective annotation paradigm that breaks a complex annotation task into a series of simpler binary questions.
We explore automatic multi-label prediction of our frames with supervised and semi-supervised approaches.
arXiv Detail & Related papers (2023-06-03T08:50:13Z) - Generating Coherent Narratives by Learning Dynamic and Discrete Entity
States with a Contrastive Framework [68.1678127433077]
We extend the Transformer model to dynamically conduct entity state updates and sentence realization for narrative generation.
Experiments on two narrative datasets show that our model can generate more coherent and diverse narratives than strong baselines.
arXiv Detail & Related papers (2022-08-08T09:02:19Z) - StreamHover: Livestream Transcript Summarization and Annotation [54.41877742041611]
We present StreamHover, a framework for annotating and summarizing livestream transcripts.
With a total of over 500 hours of videos annotated with both extractive and abstractive summaries, our benchmark dataset is significantly larger than currently existing annotated corpora.
We show that our model generalizes better and improves performance over strong baselines.
arXiv Detail & Related papers (2021-09-11T02:19:37Z) - Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions.
To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles.
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z) - BaitWatcher: A lightweight web interface for the detection of
incongruent news headlines [27.29585619643952]
BaitWatcher is a lightweight web interface that guides readers in estimating the likelihood of incongruence in news articles before clicking on the headlines.
BaiittWatcher utilizes a hierarchical recurrent encoder that efficiently learns complex textual representations of a news headline and its associated body text.
arXiv Detail & Related papers (2020-03-23T23:43:02Z) - Generating Representative Headlines for News Stories [31.67864779497127]
Grouping articles that are reporting the same event into news stories is a common way of assisting readers in their news consumption.
It remains a challenging research problem to efficiently and effectively generate a representative headline for each story.
We develop a distant supervision approach to train large-scale generation models without any human annotation.
arXiv Detail & Related papers (2020-01-26T02:08:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.