Misinformation Has High Perplexity
- URL: http://arxiv.org/abs/2006.04666v2
- Date: Wed, 10 Jun 2020 08:49:30 GMT
- Title: Misinformation Has High Perplexity
- Authors: Nayeon Lee, Yejin Bang, Andrea Madotto, Pascale Fung
- Abstract summary: We propose to leverage the perplexity to debunk false claims in an unsupervised manner.
First, we extract reliable evidence from scientific and news sources according to sentence similarity to the claims.
Second, we prime a language model with the extracted evidence and finally evaluate the correctness of given claims based on the perplexity scores at debunking time.
- Score: 55.47422012881148
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Debunking misinformation is an important and time-critical task as there
could be adverse consequences when misinformation is not quashed promptly.
However, the usual supervised approach to debunking via misinformation
classification requires human-annotated data and is not suited to the fast
time-frame of newly emerging events such as the COVID-19 outbreak. In this
paper, we postulate that misinformation itself has higher perplexity compared
to truthful statements, and propose to leverage the perplexity to debunk false
claims in an unsupervised manner. First, we extract reliable evidence from
scientific and news sources according to sentence similarity to the claims.
Second, we prime a language model with the extracted evidence and finally
evaluate the correctness of given claims based on the perplexity scores at
debunking time. We construct two new COVID-19-related test sets, one is
scientific, and another is political in content, and empirically verify that
our system performs favorably compared to existing systems. We are releasing
these datasets publicly to encourage more research in debunking misinformation
on COVID-19 and other topics.
Related papers
- Missci: Reconstructing Fallacies in Misrepresented Science [84.32990746227385]
Health-related misinformation on social networks can lead to poor decision-making and real-world dangers.
Missci is a novel argumentation theoretical model for fallacious reasoning.
We present Missci as a dataset to test the critical reasoning abilities of large language models.
arXiv Detail & Related papers (2024-06-05T12:11:10Z) - AMIR: Automated MisInformation Rebuttal -- A COVID-19 Vaccination Datasets based Recommendation System [0.05461938536945722]
This work explored how existing information obtained from social media can be harnessed to facilitate automated rebuttal of misinformation at scale.
It leverages two publicly available datasets, FaCov (fact-checked articles) and misleading (social media Twitter) data on COVID-19 Vaccination.
arXiv Detail & Related papers (2023-10-29T13:07:33Z) - Reinforcement Learning-based Counter-Misinformation Response Generation:
A Case Study of COVID-19 Vaccine Misinformation [19.245814221211415]
Non-expert ordinary users act as eyes-on-the-ground who proactively counter misinformation.
In this work, we create two novel datasets of misinformation and counter-misinformation response pairs.
We propose MisinfoCorrect, a reinforcement learning-based framework that learns to generate counter-misinformation responses.
arXiv Detail & Related papers (2023-03-11T15:55:01Z) - Missing Counter-Evidence Renders NLP Fact-Checking Unrealistic for
Misinformation [67.69725605939315]
Misinformation emerges in times of uncertainty when credible information is limited.
This is challenging for NLP-based fact-checking as it relies on counter-evidence, which may not yet be available.
arXiv Detail & Related papers (2022-10-25T09:40:48Z) - Generating Literal and Implied Subquestions to Fact-check Complex Claims [64.81832149826035]
We focus on decomposing a complex claim into a comprehensive set of yes-no subquestions whose answers influence the veracity of the claim.
We present ClaimDecomp, a dataset of decompositions for over 1000 claims.
We show that these subquestions can help identify relevant evidence to fact-check the full claim and derive the veracity through their answers.
arXiv Detail & Related papers (2022-05-14T00:40:57Z) - FaVIQ: FAct Verification from Information-seeking Questions [77.7067957445298]
We construct a large-scale fact verification dataset called FaVIQ using information-seeking questions posed by real users.
Our claims are verified to be natural, contain little lexical bias, and require a complete understanding of the evidence for verification.
arXiv Detail & Related papers (2021-07-05T17:31:44Z) - COVID-Fact: Fact Extraction and Verification of Real-World Claims on
COVID-19 Pandemic [12.078052727772718]
We introduce a FEVER-like dataset COVID-Fact of $4,086$ claims concerning the COVID-19 pandemic.
The dataset contains claims, evidence for the claims, and contradictory claims refuted by the evidence.
arXiv Detail & Related papers (2021-06-07T16:59:46Z) - Two Stage Transformer Model for COVID-19 Fake News Detection and Fact
Checking [0.3441021278275805]
We develop a two stage automated pipeline for COVID-19 fake news detection using state of the art machine learning models for natural language processing.
The first model leverages a novel fact checking algorithm that retrieves the most relevant facts concerning user claims about particular COVID-19 claims.
The second model verifies the level of truth in the claim by computing the textual entailment between the claim and the true facts retrieved from a manually curated COVID-19 dataset.
arXiv Detail & Related papers (2020-11-26T11:50:45Z) - Fact or Fiction: Verifying Scientific Claims [53.29101835904273]
We introduce scientific claim verification, a new task to select abstracts from the research literature containing evidence that SUPPORTS or REFUTES a given scientific claim.
We construct SciFact, a dataset of 1.4K expert-written scientific claims paired with evidence-containing abstracts annotated with labels and rationales.
We show that our system is able to verify claims related to COVID-19 by identifying evidence from the CORD-19 corpus.
arXiv Detail & Related papers (2020-04-30T17:22:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.