Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked
Claims
- URL: http://arxiv.org/abs/2204.12294v1
- Date: Tue, 26 Apr 2022 13:18:27 GMT
- Title: Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked
Claims
- Authors: Ivan Srba, Branislav Pecher, Matus Tomlein, Robert Moro, Elena
Stefancova, Jakub Simko, Maria Bielikova
- Abstract summary: We publish a feature-rich dataset of 317k medical news articles/blogs and 3.5k fact-checked claims.
It also contains 573 manually and more than 51k automatically labelled mappings between claims and articles.
The dataset enables a number of additional tasks related to medical misinformation, such as misinformation characterisation studies or studies of misinformation diffusion between sources.
- Score: 0.6927055673104934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: False information has a significant negative influence on individuals as well
as on the whole society. Especially in the current COVID-19 era, we witness an
unprecedented growth of medical misinformation. To help tackle this problem
with machine learning approaches, we are publishing a feature-rich dataset of
approx. 317k medical news articles/blogs and 3.5k fact-checked claims. It also
contains 573 manually and more than 51k automatically labelled mappings between
claims and articles. Mappings consist of claim presence, i.e., whether a claim
is contained in a given article, and article stance towards the claim. We
provide several baselines for these two tasks and evaluate them on the manually
labelled part of the dataset. The dataset enables a number of additional tasks
related to medical misinformation, such as misinformation characterisation
studies or studies of misinformation diffusion between sources.
Related papers
- Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - Missci: Reconstructing Fallacies in Misrepresented Science [84.32990746227385]
Health-related misinformation on social networks can lead to poor decision-making and real-world dangers.
Missci is a novel argumentation theoretical model for fallacious reasoning.
We present Missci as a dataset to test the critical reasoning abilities of large language models.
arXiv Detail & Related papers (2024-06-05T12:11:10Z) - AMIR: Automated MisInformation Rebuttal -- A COVID-19 Vaccination Datasets based Recommendation System [0.05461938536945722]
This work explored how existing information obtained from social media can be harnessed to facilitate automated rebuttal of misinformation at scale.
It leverages two publicly available datasets, FaCov (fact-checked articles) and misleading (social media Twitter) data on COVID-19 Vaccination.
arXiv Detail & Related papers (2023-10-29T13:07:33Z) - Lost in Translation -- Multilingual Misinformation and its Evolution [52.07628580627591]
This paper investigates the prevalence and dynamics of multilingual misinformation through an analysis of over 250,000 unique fact-checks spanning 95 languages.
We find that while the majority of misinformation claims are only fact-checked once, 11.7%, corresponding to more than 21,000 claims, are checked multiple times.
Using fact-checks as a proxy for the spread of misinformation, we find 33% of repeated claims cross linguistic boundaries.
arXiv Detail & Related papers (2023-10-27T12:21:55Z) - Med-MMHL: A Multi-Modal Dataset for Detecting Human- and LLM-Generated
Misinformation in the Medical Domain [14.837495995122598]
Med-MMHL is a novel multi-modal misinformation detection dataset in a general medical domain encompassing multiple diseases.
Our dataset aims to facilitate comprehensive research and development of methodologies for detecting misinformation across diverse diseases and various scenarios.
arXiv Detail & Related papers (2023-06-15T05:59:11Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Empowering the Fact-checkers! Automatic Identification of Claim Spans on
Twitter [25.944789217337338]
Claim Span Identification (CSI) is a tool to automatically identify and extract the snippets of claim-worthy (mis)information present in a post.
We propose CURT, a large-scale Twitter corpus with token-level claim spans on more than 7.5k tweets.
We benchmark our dataset with DABERTa, an adapter-based variation of RoBERTa.
arXiv Detail & Related papers (2022-10-10T14:08:46Z) - CoVERT: A Corpus of Fact-checked Biomedical COVID-19 Tweets [10.536415845097661]
CoVERT is a fact-checked corpus of tweets with a focus on biomedicine and COVID-19-related (mis)information.
We employ a novel crowdsourcing methodology to annotate all tweets with fact-checking labels and supporting evidence, which crowdworkers search for online.
We use the retrieved evidence extracts as part of a fact-checking pipeline, finding that the real-world evidence is more useful than the knowledge indirectly available in pretrained language models.
arXiv Detail & Related papers (2022-04-26T09:05:03Z) - FaVIQ: FAct Verification from Information-seeking Questions [77.7067957445298]
We construct a large-scale fact verification dataset called FaVIQ using information-seeking questions posed by real users.
Our claims are verified to be natural, contain little lexical bias, and require a complete understanding of the evidence for verification.
arXiv Detail & Related papers (2021-07-05T17:31:44Z) - Claim Detection in Biomedical Twitter Posts [11.335643770130238]
False information on biomedical topics can be particularly dangerous.
We aim to fill this research gap and annotate a corpus of 1200 tweets for implicit and explicit biomedical claims.
We develop baseline models which detect tweets that contain a claim automatically.
arXiv Detail & Related papers (2021-04-23T14:45:31Z) - Misinformation Has High Perplexity [55.47422012881148]
We propose to leverage the perplexity to debunk false claims in an unsupervised manner.
First, we extract reliable evidence from scientific and news sources according to sentence similarity to the claims.
Second, we prime a language model with the extracted evidence and finally evaluate the correctness of given claims based on the perplexity scores at debunking time.
arXiv Detail & Related papers (2020-06-08T15:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.