A Novel Method for News Article Event-Based Embedding
- URL: http://arxiv.org/abs/2405.13071v2
- Date: Fri, 2 Aug 2024 09:30:03 GMT
- Title: A Novel Method for News Article Event-Based Embedding
- Authors: Koren Ishlach, Itzhak Ben-David, Michael Fire, Lior Rokach,
- Abstract summary: We propose a novel lightweight method that optimized news embedding generation by focusing on entities and themes mentioned in articles.
We leveraged over 850,000 news articles and 1,000,000 events from the GDELT project to test and evaluate our method.
Our experiments demonstrate that our approach can both improve and outperform state-of-the-art methods on shared event detection tasks.
- Score: 8.183446952097528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embedding news articles is a crucial tool for multiple fields, such as media bias detection, identifying fake news, and making news recommendations. However, existing news embedding methods are not optimized to capture the latent context of news events. Most embedding methods rely on full-text information and neglect time-relevant embedding generation. In this paper, we propose a novel lightweight method that optimizes news embedding generation by focusing on entities and themes mentioned in articles and their historical connections to specific events. We suggest a method composed of three stages. First, we process and extract events, entities, and themes from the given news articles. Second, we generate periodic time embeddings for themes and entities by training time-separated GloVe models on current and historical data. Lastly, we concatenate the news embeddings generated by two distinct approaches: Smooth Inverse Frequency (SIF) for article-level vectors and Siamese Neural Networks for embeddings with nuanced event-related information. We leveraged over 850,000 news articles and 1,000,000 events from the GDELT project to test and evaluate our method. We conducted a comparative analysis of different news embedding generation methods for validation. Our experiments demonstrate that our approach can both improve and outperform state-of-the-art methods on shared event detection tasks.
Related papers
- SCStory: Self-supervised and Continual Online Story Discovery [53.72745249384159]
SCStory helps people digest rapidly published news article streams in real-time without human annotations.
SCStory employs self-supervised and continual learning with a novel idea of story-indicative adaptive modeling of news article streams.
arXiv Detail & Related papers (2023-11-27T04:50:01Z) - TieFake: Title-Text Similarity and Emotion-Aware Fake News Detection [15.386007761649251]
We propose a novel Title-Text similarity and emotion-aware Fake news detection (TieFake) method by jointly modeling the multi-modal context information and the author sentiment.
Specifically, we employ BERT and ResNeSt to learn the representations for text and images, and utilize publisher emotion extractor to capture the author's subjective emotion in the news content.
arXiv Detail & Related papers (2023-04-19T04:47:36Z) - Unsupervised Story Discovery from Continuous News Streams via Scalable
Thematic Embedding [37.62597275581973]
Unsupervised discovery of stories with correlated news articles in real-time helps people digest massive news streams without expensive human annotations.
We propose a novel thematic embedding with an off-the-shelf pretrained sentence encoder to dynamically represent articles and stories.
A thorough evaluation with real news data sets demonstrates that USTORY achieves higher story discovery performances than baselines.
arXiv Detail & Related papers (2023-04-08T20:41:15Z) - Towards Corpus-Scale Discovery of Selection Biases in News Coverage:
Comparing What Sources Say About Entities as a Start [65.28355014154549]
This paper investigates the challenges of building scalable NLP systems for discovering patterns of media selection biases directly from news content in massive-scale news corpora.
We show the capabilities of the framework through a case study on NELA-2020, a corpus of 1.8M news articles in English from 519 news sources worldwide.
arXiv Detail & Related papers (2023-04-06T23:36:45Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - Neural News Recommendation with Event Extraction [0.0]
A key challenge of online news recommendation is to help users find articles they are interested in.
Traditional news recommendation methods usually use single news information, which is insufficient to encode news and user representation.
We propose an Event Extraction-based News Recommendation framework to overcome these shortcomings.
arXiv Detail & Related papers (2021-11-09T11:56:38Z) - Joint Multimedia Event Extraction from Video and Article [51.159034070824056]
We propose the first approach to jointly extract events from video and text articles.
First, we propose the first self-supervised multimodal event coreference model.
Second, we introduce the first multimodal transformer which extracts structured event information jointly from both videos and text documents.
arXiv Detail & Related papers (2021-09-27T03:22:12Z) - Embracing Domain Differences in Fake News: Cross-domain Fake News
Detection using Multi-modal Data [18.66426327152407]
We propose a novel framework that jointly preserves domain-specific and cross-domain knowledge in news records to detect fake news from different domains.
Our experiments show that the integration of the proposed fake news model and the selective annotation approach achieves state-of-the-art performance for cross-domain news datasets.
arXiv Detail & Related papers (2021-02-11T23:31:14Z) - Cross-media Structured Common Space for Multimedia Event Extraction [82.36301617438268]
We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents.
We propose a novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations of semantic information into a common embedding space.
By utilizing images, we extract 21.4% more event mentions than traditional text-only methods.
arXiv Detail & Related papers (2020-05-05T20:21:53Z) - Generating Representative Headlines for News Stories [31.67864779497127]
Grouping articles that are reporting the same event into news stories is a common way of assisting readers in their news consumption.
It remains a challenging research problem to efficiently and effectively generate a representative headline for each story.
We develop a distant supervision approach to train large-scale generation models without any human annotation.
arXiv Detail & Related papers (2020-01-26T02:08:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.