Related papers: Self-Supervised Claim Identification for Automated Fact Checking

Self-Supervised Claim Identification for Automated Fact Checking

URL: http://arxiv.org/abs/2102.02335v1
Date: Wed, 3 Feb 2021 23:37:09 GMT
Title: Self-Supervised Claim Identification for Automated Fact Checking
Authors: Archita Pathak, Mohammad Abuzar Shaikh, Rohini Srihari
Abstract summary: We propose a novel, attention-based self-supervised approach to identify "claim-worthy" sentences in a fake news article. We leverage "aboutness" of headline and content using attention mechanism for this task.
Score: 2.578242050187029
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel, attention-based self-supervised approach to identify "claim-worthy" sentences in a fake news article, an important first step in automated fact-checking. We leverage "aboutness" of headline and content using attention mechanism for this task. The identified claims can be used for downstream task of claim verification for which we are releasing a benchmark dataset of manually selected compelling articles with veracity labels and associated evidence. This work goes beyond stylistic analysis to identifying content that influences reader belief. Experiments with three datasets show the strength of our model. Data and code available at https://github.com/architapathak/Self-Supervised-ClaimIdentification

Related papers

CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking [85.68235482145091]
Large-scale speech datasets have become valuable intellectual property. We propose a novel dataset ownership verification method. Our approach introduces a clustering-based backdoor watermark (CBW) We conduct extensive experiments on benchmark datasets, verifying the effectiveness and robustness of our method against potential adaptive attacks.
arXiv Detail & Related papers (2025-03-02T02:02:57Z)
Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain. We propose an adversarial algorithm to make the retriever component robust against distribution shift. We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z)
ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information. To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles. Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z)
Unsupervised Text Deidentification [101.2219634341714]
We propose an unsupervised deidentification method that masks words that leak personally-identifying information. Motivated by K-anonymity based privacy, we generate redactions that ensure a minimum reidentification rank.
arXiv Detail & Related papers (2022-10-20T18:54:39Z)
Empowering the Fact-checkers! Automatic Identification of Claim Spans on Twitter [25.944789217337338]
Claim Span Identification (CSI) is a tool to automatically identify and extract the snippets of claim-worthy (mis)information present in a post. We propose CURT, a large-scale Twitter corpus with token-level claim spans on more than 7.5k tweets. We benchmark our dataset with DABERTa, an adapter-based variation of RoBERTa.
arXiv Detail & Related papers (2022-10-10T14:08:46Z)
PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage. Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors. We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z)
Generating Fluent Fact Checking Explanations with Unsupervised Post-Editing [22.5444107755288]
We present an iterative edit-based algorithm that uses only phrase-level edits to perform unsupervised post-editing of ruling comments. We show that our model generates explanations that are fluent, readable, non-redundant, and cover important information for the fact check.
arXiv Detail & Related papers (2021-12-13T15:31:07Z)
Assessing Effectiveness of Using Internal Signals for Check-Worthy Claim Identification in Unlabeled Data for Automated Fact-Checking [6.193231258199234]
This paper explores methodology to identify check-worthy claim sentences from fake news articles. We leverage two internal supervisory signals - headline and the abstractive summary - to rank the sentences. We show that while the headline has more gisting similarity with how a fact-checking website writes a claim, the summary-based pipeline is the most promising for an end-to-end fact-checking system.
arXiv Detail & Related papers (2021-11-02T16:17:20Z)
AutoTriggER: Label-Efficient and Robust Named Entity Recognition with Auxiliary Trigger Extraction [54.20039200180071]
We present a novel framework to improve NER performance by automatically generating and leveraging entity triggers'' Our framework leverages post-hoc explanation to generate rationales and strengthens a model's prior knowledge using an embedding technique. AutoTriggER shows strong label-efficiency, is capable of generalizing to unseen entities, and outperforms the RoBERTa-CRF baseline by nearly 0.5 F1 points on average.
arXiv Detail & Related papers (2021-09-10T08:11:56Z)
ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences [3.7405995078130148]
We propose a novel unsupervised task of identifying sentences containing key disinformation within a document that is known to be untrustworthy. We design a three-phase statistical NLP solution for the task which starts with embedding sentences within a bespoke feature space designed for the task. We show that our method is able to identify core disinformation effectively.
arXiv Detail & Related papers (2020-10-21T08:53:36Z)
Generating Fact Checking Explanations [52.879658637466605]
A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process. This paper provides the first study of how these explanations can be generated automatically based on available claim context. Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system.
arXiv Detail & Related papers (2020-04-13T05:23:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.