FacTeR-Check: Semi-automated fact-checking through Semantic Similarity
and Natural Language Inference
- URL: http://arxiv.org/abs/2110.14532v1
- Date: Wed, 27 Oct 2021 15:44:54 GMT
- Title: FacTeR-Check: Semi-automated fact-checking through Semantic Similarity
and Natural Language Inference
- Authors: Alejandro Mart\'in and Javier Huertas-Tato and \'Alvaro
Huertas-Garc\'ia and Guillermo Villar-Rodr\'iguez and David Camacho
- Abstract summary: FacTeR-Check enables retrieving fact-checked information, unchecked claims verification and tracking dangerous information over social media.
The architecture is validated using a new dataset called NLI19-SP that is publicly released with COVID-19 related hoaxes and tweets from Spanish social media.
Our results show state-of-the-art performance on the individual benchmarks, as well as producing useful analysis of the evolution over time of 61 different hoaxes.
- Score: 61.068947982746224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Our society produces and shares overwhelming amounts of information through
the Online Social Networks (OSNs). Within this environment, misinformation and
disinformation have proliferated, becoming a public safety concern on every
country. Allowing the public and professionals to efficiently find reliable
evidence about the factual veracity of a claim is crucial to mitigate this
harmful spread. To this end, we propose FacTeR-Check, a multilingual
architecture for semi-automated fact-checking that can be used for either the
general public but also useful for fact-checking organisations. FacTeR-Check
enables retrieving fact-checked information, unchecked claims verification and
tracking dangerous information over social media. This architectures involves
several modules developed to evaluate semantic similarity, to calculate natural
language inference and to retrieve information from Online Social Networks. The
union of all these modules builds a semi-automated fact-checking tool able of
verifying new claims, to extract related evidence, and to track the evolution
of a hoax on a OSN. While individual modules are validated on related
benchmarks (mainly MSTS and SICK), the complete architecture is validated using
a new dataset called NLI19-SP that is publicly released with COVID-19 related
hoaxes and tweets from Spanish social media. Our results show state-of-the-art
performance on the individual benchmarks, as well as producing useful analysis
of the evolution over time of 61 different hoaxes.
Related papers
- How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models [95.44559524735308]
Large language or multimodal model based verification has been proposed to scale up online policing mechanisms for mitigating spread of false and harmful content.
We test the limits of improving foundation model performance without continual updating through an initial study of knowledge transfer.
Our results on two recent multi-modal fact-checking benchmarks, Mocheg and Fakeddit, indicate that knowledge transfer strategies can improve Fakeddit performance over the state-of-the-art by up to 1.7% and Mocheg performance by up to 2.9%.
arXiv Detail & Related papers (2024-06-29T08:39:07Z) - Automated Claim Matching with Large Language Models: Empowering
Fact-Checkers in the Fight Against Misinformation [11.323961700172175]
FACT-GPT is a framework designed to automate the claim matching phase of fact-checking using Large Language Models.
This framework identifies new social media content that either supports or contradicts claims previously debunked by fact-checkers.
We evaluated FACT-GPT on an extensive dataset of social media content related to public health.
arXiv Detail & Related papers (2023-10-13T16:21:07Z) - FactLLaMA: Optimizing Instruction-Following Language Models with
External Knowledge for Automated Fact-Checking [10.046323978189847]
We propose combining the power of instruction-following language models with external evidence retrieval to enhance fact-checking performance.
Our approach involves leveraging search engines to retrieve relevant evidence for a given input claim.
Then, we instruct-tune an open-sourced language model, called LLaMA, using this evidence, enabling it to predict the veracity of the input claim more accurately.
arXiv Detail & Related papers (2023-09-01T04:14:39Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-Checking [55.75590135151682]
CHEF is the first CHinese Evidence-based Fact-checking dataset of 10K real-world claims.
The dataset covers multiple domains, ranging from politics to public health, and provides annotated evidence retrieved from the Internet.
arXiv Detail & Related papers (2022-06-06T09:11:03Z) - Applying Automatic Text Summarization for Fake News Detection [4.2177790395417745]
The distribution of fake news is not a new but a rapidly growing problem.
We present an approach to the problem that combines the power of transformer-based language models.
Our framework, CMTR-BERT, combines multiple text representations and enables the incorporation of contextual information.
arXiv Detail & Related papers (2022-04-04T21:00:55Z) - Synthetic Disinformation Attacks on Automated Fact Verification Systems [53.011635547834025]
We explore the sensitivity of automated fact-checkers to synthetic adversarial evidence in two simulated settings.
We show that these systems suffer significant performance drops against these attacks.
We discuss the growing threat of modern NLG systems as generators of disinformation.
arXiv Detail & Related papers (2022-02-18T19:01:01Z) - CsFEVER and CTKFacts: Czech Datasets for Fact Verification [0.0]
We present two Czech datasets aimed for training automated fact-checking machine learning models.
The first dataset is CsFEVER of approximately 112k claims which is an automatically generated Czech version of the well-known Wikipedia-based FEVER dataset.
The second dataset CTKFacts of 3,097 claims is built on the corpus of approximately two million Czech News Agency news reports.
arXiv Detail & Related papers (2022-01-26T18:48:42Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Sentimental LIAR: Extended Corpus and Deep Learning Models for Fake
Claim Classification [11.650381752104296]
This paper proposes a novel deep learning approach for automated detection of false short-text claims on social media.
We first introduce Sentimental LIAR, which extends the LIAR dataset of short claims by adding features based on sentiment and emotion analysis of claims.
Our results demonstrate that the proposed architecture trained on Sentimental LIAR can achieve an accuracy of 70%, which is an improvement of 30% over previously reported results for the LIAR benchmark.
arXiv Detail & Related papers (2020-09-01T02:48:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.