SciLander: Mapping the Scientific News Landscape
- URL: http://arxiv.org/abs/2205.07970v1
- Date: Mon, 16 May 2022 20:20:43 GMT
- Title: SciLander: Mapping the Scientific News Landscape
- Authors: Maur\'icio Gruppi, Panayiotis Smeros, Sibel Adal{\i}, Carlos Castillo,
Karl Aberer
- Abstract summary: We introduce SciLander, a method for learning representations of news sources reporting on science-based topics.
We evaluate our method on a novel COVID-19 dataset containing nearly 1M news articles from 500 sources spanning a period of 18 months since the beginning of the pandemic in 2020.
- Score: 8.504643390943409
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The COVID-19 pandemic has fueled the spread of misinformation on social media
and the Web as a whole. The phenomenon dubbed `infodemic' has taken the
challenges of information veracity and trust to new heights by massively
introducing seemingly scientific and technical elements into misleading
content. Despite the existing body of work on modeling and predicting
misinformation, the coverage of very complex scientific topics with inherent
uncertainty and an evolving set of findings, such as COVID-19, provides many
new challenges that are not easily solved by existing tools. To address these
issues, we introduce SciLander, a method for learning representations of news
sources reporting on science-based topics. SciLander extracts four
heterogeneous indicators for the news sources; two generic indicators that
capture (1) the copying of news stories between sources, and (2) the use of the
same terms to mean different things (i.e., the semantic shift of terms), and
two scientific indicators that capture (1) the usage of jargon and (2) the
stance towards specific citations. We use these indicators as signals of source
agreement, sampling pairs of positive (similar) and negative (dissimilar)
samples, and combine them in a unified framework to train unsupervised news
source embeddings with a triplet margin loss objective. We evaluate our method
on a novel COVID-19 dataset containing nearly 1M news articles from 500 sources
spanning a period of 18 months since the beginning of the pandemic in 2020. Our
results show that the features learned by our model outperform state-of-the-art
baseline methods on the task of news veracity classification. Furthermore, a
clustering analysis suggests that the learned representations encode
information about the reliability, political leaning, and partisanship bias of
these sources.
Related papers
- Mapping the Media Landscape: Predicting Factual Reporting and Political Bias Through Web Interactions [0.7249731529275342]
We propose an extension to a recently presented news media reliability estimation method.
We assess the classification performance of four reinforcement learning strategies on a large news media hyperlink graph.
Our experiments, targeting two challenging bias descriptors, factual reporting and political bias, showed a significant performance improvement at the source media level.
arXiv Detail & Related papers (2024-10-23T08:18:26Z) - Knowledge Graph Representation for Political Information Sources [16.959319157216466]
We analyze data collected from two news portals, Breitbart News (BN) and New York Times (NYT)
Our research findings are presented through knowledge graphs, utilizing a dataset spanning 11.5 years gathered from BN and NYT media portals.
arXiv Detail & Related papers (2024-04-04T13:36:01Z) - Can Large Language Models Detect Misinformation in Scientific News
Reporting? [1.0344642971058586]
This paper investigates whether it is possible to use large language models (LLMs) to detect misinformation in scientific reporting.
We first present a new labeled dataset SciNews, containing 2.4k scientific news stories drawn from trusted and untrustworthy sources.
We identify dimensions of scientific validity in science news articles and explore how this can be integrated into the automated detection of scientific misinformation.
arXiv Detail & Related papers (2024-02-22T04:07:00Z) - Towards Corpus-Scale Discovery of Selection Biases in News Coverage:
Comparing What Sources Say About Entities as a Start [65.28355014154549]
This paper investigates the challenges of building scalable NLP systems for discovering patterns of media selection biases directly from news content in massive-scale news corpora.
We show the capabilities of the framework through a case study on NELA-2020, a corpus of 1.8M news articles in English from 519 news sources worldwide.
arXiv Detail & Related papers (2023-04-06T23:36:45Z) - The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.
We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.
The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - NeuS: Neutral Multi-News Summarization for Mitigating Framing Bias [54.89737992911079]
We propose a new task, a neutral summary generation from multiple news headlines of the varying political spectrum.
One of the most interesting observations is that generation models can hallucinate not only factually inaccurate or unverifiable content, but also politically biased content.
arXiv Detail & Related papers (2022-04-11T07:06:01Z) - "Don't quote me on that": Finding Mixtures of Sources in News Articles [85.92467549469147]
We construct an ontological labeling system for sources based on each source's textitaffiliation and textitrole
We build a probabilistic model to infer these attributes for named sources and to describe news articles as mixtures of these sources.
arXiv Detail & Related papers (2021-04-19T21:57:11Z) - On Representation Learning for Scientific News Articles Using
Heterogeneous Knowledge Graphs [4.186267062202487]
We present a methodology for creating scientific news article representations by modeling the directed graph between the scientific news articles and the cited scientific publications.
The results show promising applications of graph neural network approaches in the domains of knowledge tracing and scientific news credibility assessment.
arXiv Detail & Related papers (2021-04-12T23:46:54Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions.
To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles.
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.