SCStory: Self-supervised and Continual Online Story Discovery
- URL: http://arxiv.org/abs/2312.03725v1
- Date: Mon, 27 Nov 2023 04:50:01 GMT
- Title: SCStory: Self-supervised and Continual Online Story Discovery
- Authors: Susik Yoon, Yu Meng, Dongha Lee, Jiawei Han
- Abstract summary: SCStory helps people digest rapidly published news article streams in real-time without human annotations.
SCStory employs self-supervised and continual learning with a novel idea of story-indicative adaptive modeling of news article streams.
- Score: 53.72745249384159
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a framework SCStory for online story discovery, that helps people
digest rapidly published news article streams in real-time without human
annotations. To organize news article streams into stories, existing approaches
directly encode the articles and cluster them based on representation
similarity. However, these methods yield noisy and inaccurate story discovery
results because the generic article embeddings do not effectively reflect the
story-indicative semantics in an article and cannot adapt to the rapidly
evolving news article streams. SCStory employs self-supervised and continual
learning with a novel idea of story-indicative adaptive modeling of news
article streams. With a lightweight hierarchical embedding module that first
learns sentence representations and then article representations, SCStory
identifies story-relevant information of news articles and uses them to
discover stories. The embedding module is continuously updated to adapt to
evolving news streams with a contrastive learning objective, backed up by two
unique techniques, confidence-aware memory replay and prioritized-augmentation,
employed for label absence and data scarcity problems. Thorough experiments on
real and the latest news data sets demonstrate that SCStory outperforms
existing state-of-the-art algorithms for unsupervised online story discovery.
Related papers
- A Novel Method for News Article Event-Based Embedding [8.183446952097528]
We propose a novel lightweight method that optimized news embedding generation by focusing on entities and themes mentioned in articles.
We leveraged over 850,000 news articles and 1,000,000 events from the GDELT project to test and evaluate our method.
Our experiments demonstrate that our approach can both improve and outperform state-of-the-art methods on shared event detection tasks.
arXiv Detail & Related papers (2024-05-20T20:55:07Z) - Prompt-and-Align: Prompt-Based Social Alignment for Few-Shot Fake News
Detection [50.07850264495737]
"Prompt-and-Align" (P&A) is a novel prompt-based paradigm for few-shot fake news detection.
We show that P&A sets new states-of-the-art for few-shot fake news detection performance by significant margins.
arXiv Detail & Related papers (2023-09-28T13:19:43Z) - Unsupervised Story Discovery from Continuous News Streams via Scalable
Thematic Embedding [37.62597275581973]
Unsupervised discovery of stories with correlated news articles in real-time helps people digest massive news streams without expensive human annotations.
We propose a novel thematic embedding with an off-the-shelf pretrained sentence encoder to dynamically represent articles and stories.
A thorough evaluation with real news data sets demonstrates that USTORY achieves higher story discovery performances than baselines.
arXiv Detail & Related papers (2023-04-08T20:41:15Z) - Generating Coherent Narratives by Learning Dynamic and Discrete Entity
States with a Contrastive Framework [68.1678127433077]
We extend the Transformer model to dynamically conduct entity state updates and sentence realization for narrative generation.
Experiments on two narrative datasets show that our model can generate more coherent and diverse narratives than strong baselines.
arXiv Detail & Related papers (2022-08-08T09:02:19Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - Tell Me Who Your Friends Are: Using Content Sharing Behavior for News
Source Veracity Detection [3.359647717705252]
We propose a novel and robust news veracity detection model that uses the content sharing behavior of news sources formulated as a network.
We show that state of the art writing style and CSN features make diverse mistakes when predicting, meaning that they both play different roles in the classification task.
arXiv Detail & Related papers (2021-01-15T21:39:51Z) - Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions.
To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles.
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z) - Fake News Detection on News-Oriented Heterogeneous Information Networks
through Hierarchical Graph Attention [12.250335118888891]
We propose a novel fake news detection framework, namely Hierarchical Graph Attention Network(HGAT)
HGAT uses a novel hierarchical attention mechanism to perform node representation learning in HIN, and then detects fake news by classifying news article nodes.
Experiments on two real-world fake news datasets show that HGAT can outperform text-based models and other network-based models.
arXiv Detail & Related papers (2020-02-05T19:09:13Z) - Generating Representative Headlines for News Stories [31.67864779497127]
Grouping articles that are reporting the same event into news stories is a common way of assisting readers in their news consumption.
It remains a challenging research problem to efficiently and effectively generate a representative headline for each story.
We develop a distant supervision approach to train large-scale generation models without any human annotation.
arXiv Detail & Related papers (2020-01-26T02:08:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.