Related papers: SemCAFE: When Named Entities make the Difference Assessing Web Source Reliability through Entity-level Analytics

SemCAFE: When Named Entities make the Difference Assessing Web Source Reliability through Entity-level Analytics

URL: http://arxiv.org/abs/2504.08776v1
Date: Thu, 03 Apr 2025 22:14:43 GMT
Title: SemCAFE: When Named Entities make the Difference Assessing Web Source Reliability through Entity-level Analytics
Authors: Gautam Kishore Shahi, Oshani Seneviratne, Marc Spaniol,
Abstract summary: SemCAFE is a system designed to detect news reliability by incorporating entity relatedness into its assessment.<n>By creating a semantic fingerprint for each news article, SemCAFE could assess the credibility of 46,020 reliable and 3,407 unreliable articles on the 2022 Russian invasion of Ukraine.
Score: 5.919180820181465
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the shift from traditional to digital media, the online landscape now hosts not only reliable news articles but also a significant amount of unreliable content. Digital media has faster reachability by significantly influencing public opinion and advancing political agendas. While newspaper readers may be familiar with their preferred outlets political leanings or credibility, determining unreliable news articles is much more challenging. The credibility of many online sources is often opaque, with AI generated content being easily disseminated at minimal cost. Unreliable news articles, particularly those that followed the Russian invasion of Ukraine in 2022, closely mimic the topics and writing styles of credible sources, making them difficult to distinguish. To address this, we introduce SemCAFE, a system designed to detect news reliability by incorporating entity relatedness into its assessment. SemCAFE employs standard Natural Language Processing techniques, such as boilerplate removal and tokenization, alongside entity level semantic analysis using the YAGO knowledge base. By creating a semantic fingerprint for each news article, SemCAFE could assess the credibility of 46,020 reliable and 3,407 unreliable articles on the 2022 Russian invasion of Ukraine. Our approach improved the macro F1 score by 12% over state of the art methods. The sample data and code are available on GitHub

Related papers

CrossNews-UA: A Cross-lingual News Semantic Similarity Benchmark for Ukrainian, Polish, Russian, and English [53.32175252285023]
Cross-lingual news comparison offers a promising approach to verify information.<n>Existing datasets for cross-lingual news analysis were manually curated by journalists and experts.<n>We introduce a scalable, explainable crowdsourcing pipeline for cross-lingual news similarity assessment.
arXiv Detail & Related papers (2025-10-22T14:23:50Z)
Tracking the Takes and Trajectories of English-Language News Narratives across Trustworthy and Worrisome Websites [4.592124824937116]
We identify and track news narratives and their attitudes across over 4,000 factually unreliable, mixed-reliability, and factually reliable English-language news websites.<n>We show that the paths of news narratives and the stances of websites toward particular entities can be used to uncover slanted propaganda networks.<n>We hope that increased visibility into our distributed news ecosystem can help with the reporting and fact-checking of propaganda and disinformation.
arXiv Detail & Related papers (2025-01-15T19:37:44Z)
News and Misinformation Consumption in Europe: A Longitudinal Cross-Country Perspective [49.1574468325115]
This study investigated information consumption in four European countries. It analyzed three years of Twitter activity from news outlet accounts in France, Germany, Italy, and the UK. Results indicate that reliable sources dominate the information landscape, although unreliable content is still present across all countries.
arXiv Detail & Related papers (2023-11-09T16:22:10Z)
Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news. Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z)
From Nuisance to News Sense: Augmenting the News with Cross-Document Evidence and Context [25.870137795858522]
We present NEWSSENSE, a novel sensemaking tool and reading interface designed to collect and integrate information from multiple news articles on a central topic. NEWSSENSE augments a central, grounding article of the user's choice by linking it to related articles from different sources. Our pilot study shows that NEWSSENSE has the potential to help users identify key information, verify the credibility of news articles, and explore different perspectives.
arXiv Detail & Related papers (2023-10-06T21:15:11Z)
Bias or Diversity? Unraveling Fine-Grained Thematic Discrepancy in U.S. News Headlines [63.52264764099532]
We use a large dataset of 1.8 million news headlines from major U.S. media outlets spanning from 2014 to 2022. We quantify the fine-grained thematic discrepancy related to four prominent topics - domestic politics, economic issues, social issues, and foreign affairs. Our findings indicate that on domestic politics and social issues, the discrepancy can be attributed to a certain degree of media bias.
arXiv Detail & Related papers (2023-03-28T03:31:37Z)
Unveiling the Hidden Agenda: Biases in News Reporting and Consumption [59.55900146668931]
We build a six-year dataset on the Italian vaccine debate and adopt a Bayesian latent space model to identify narrative and selection biases. We found a nonlinear relationship between biases and engagement, with higher engagement for extreme positions. Analysis of news consumption on Twitter reveals common audiences among news outlets with similar ideological positions.
arXiv Detail & Related papers (2023-01-14T18:58:42Z)
Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection. The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z)
Stance Detection with BERT Embeddings for Credibility Analysis of Information on Social Media [1.7616042687330642]
We propose a model for detecting fake news using stance as one of the features along with the content of the article. Our work interprets the content with automatic feature extraction and the relevance of the text pieces. The experiment conducted on the real-world dataset indicates that our model outperforms the previous work and enables fake news detection with an accuracy of 95.32%.
arXiv Detail & Related papers (2021-05-21T10:46:43Z)
Misinfo Belief Frames: A Case Study on Covid & Climate News [49.979419711713795]
We propose a formalism for understanding how readers perceive the reliability of news and the impact of misinformation. We introduce the Misinfo Belief Frames (MBF) corpus, a dataset of 66k inferences over 23.5k headlines. Our results using large-scale language modeling to predict misinformation frames show that machine-generated inferences can influence readers' trust in news headlines.
arXiv Detail & Related papers (2021-04-18T09:50:11Z)
Supporting verification of news articles with automated search for semantically similar articles [0.0]
We propose an evidence retrieval approach to handle fake news. The learning task is formulated as an unsupervised machine learning problem. We find that our approach is agnostic to concept drifts, i.e. the machine learning task is independent of the hypotheses in a text.
arXiv Detail & Related papers (2021-03-29T12:56:59Z)
A Survey on Predicting the Factuality and the Bias of News Media [29.032850263311342]
"The state of the art on media profiling for factuality and bias" "Political bias detection, which in the Western political landscape is about predicting left-center-right bias" "Recent advances in using different information sources and modalities"
arXiv Detail & Related papers (2021-03-16T11:11:54Z)
Political audience diversity and news reliability in algorithmic ranking [54.23273310155137]
We propose using the political diversity of a website's audience as a quality signal. Using news source reliability ratings from domain experts and web browsing data from a diverse sample of 6,890 U.S. citizens, we first show that websites with more extreme and less politically diverse audiences have lower journalistic standards.
arXiv Detail & Related papers (2020-07-16T02:13:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.