Self-Supervised Claim Identification for Automated Fact Checking
- URL: http://arxiv.org/abs/2102.02335v1
- Date: Wed, 3 Feb 2021 23:37:09 GMT
- Title: Self-Supervised Claim Identification for Automated Fact Checking
- Authors: Archita Pathak, Mohammad Abuzar Shaikh, Rohini Srihari
- Abstract summary: We propose a novel, attention-based self-supervised approach to identify "claim-worthy" sentences in a fake news article.
We leverage "aboutness" of headline and content using attention mechanism for this task.
- Score: 2.578242050187029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel, attention-based self-supervised approach to identify
"claim-worthy" sentences in a fake news article, an important first step in
automated fact-checking. We leverage "aboutness" of headline and content using
attention mechanism for this task. The identified claims can be used for
downstream task of claim verification for which we are releasing a benchmark
dataset of manually selected compelling articles with veracity labels and
associated evidence. This work goes beyond stylistic analysis to identifying
content that influences reader belief. Experiments with three datasets show the
strength of our model. Data and code available at
https://github.com/architapathak/Self-Supervised-ClaimIdentification
Related papers
- Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - CAMELL: Confidence-based Acquisition Model for Efficient Self-supervised
Active Learning with Label Validation [6.918298428336528]
Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets.
We present textbfCAMELL, a pool-based active learning framework tailored for sequential multi-output problems.
arXiv Detail & Related papers (2023-10-13T08:19:31Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Unsupervised Text Deidentification [101.2219634341714]
We propose an unsupervised deidentification method that masks words that leak personally-identifying information.
Motivated by K-anonymity based privacy, we generate redactions that ensure a minimum reidentification rank.
arXiv Detail & Related papers (2022-10-20T18:54:39Z) - Empowering the Fact-checkers! Automatic Identification of Claim Spans on
Twitter [25.944789217337338]
Claim Span Identification (CSI) is a tool to automatically identify and extract the snippets of claim-worthy (mis)information present in a post.
We propose CURT, a large-scale Twitter corpus with token-level claim spans on more than 7.5k tweets.
We benchmark our dataset with DABERTa, an adapter-based variation of RoBERTa.
arXiv Detail & Related papers (2022-10-10T14:08:46Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Assessing Effectiveness of Using Internal Signals for Check-Worthy Claim
Identification in Unlabeled Data for Automated Fact-Checking [6.193231258199234]
This paper explores methodology to identify check-worthy claim sentences from fake news articles.
We leverage two internal supervisory signals - headline and the abstractive summary - to rank the sentences.
We show that while the headline has more gisting similarity with how a fact-checking website writes a claim, the summary-based pipeline is the most promising for an end-to-end fact-checking system.
arXiv Detail & Related papers (2021-11-02T16:17:20Z) - AutoTriggER: Label-Efficient and Robust Named Entity Recognition with
Auxiliary Trigger Extraction [54.20039200180071]
We present a novel framework to improve NER performance by automatically generating and leveraging entity triggers''
Our framework leverages post-hoc explanation to generate rationales and strengthens a model's prior knowledge using an embedding technique.
AutoTriggER shows strong label-efficiency, is capable of generalizing to unseen entities, and outperforms the RoBERTa-CRF baseline by nearly 0.5 F1 points on average.
arXiv Detail & Related papers (2021-09-10T08:11:56Z) - ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences [3.7405995078130148]
We propose a novel unsupervised task of identifying sentences containing key disinformation within a document that is known to be untrustworthy.
We design a three-phase statistical NLP solution for the task which starts with embedding sentences within a bespoke feature space designed for the task.
We show that our method is able to identify core disinformation effectively.
arXiv Detail & Related papers (2020-10-21T08:53:36Z) - Generating Fact Checking Explanations [52.879658637466605]
A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process.
This paper provides the first study of how these explanations can be generated automatically based on available claim context.
Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system.
arXiv Detail & Related papers (2020-04-13T05:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.