Harnessing Abstractive Summarization for Fact-Checked Claim Detection
- URL: http://arxiv.org/abs/2209.04612v2
- Date: Wed, 14 Sep 2022 10:06:24 GMT
- Title: Harnessing Abstractive Summarization for Fact-Checked Claim Detection
- Authors: Varad Bhatnagar, Diptesh Kanojia, Kameswari Chebrolu
- Abstract summary: Social media platforms have become new battlegrounds for anti-social elements, with misinformation being the weapon of choice.
We believe that the solution lies in partial automation of the fact-checking life cycle, saving human time for tasks which require high cognition.
We propose a new workflow for efficiently detecting previously fact-checked claims that uses abstractive summarization to generate crisp queries.
- Score: 8.49182897482236
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Social media platforms have become new battlegrounds for anti-social
elements, with misinformation being the weapon of choice. Fact-checking
organizations try to debunk as many claims as possible while staying true to
their journalistic processes but cannot cope with its rapid dissemination. We
believe that the solution lies in partial automation of the fact-checking life
cycle, saving human time for tasks which require high cognition. We propose a
new workflow for efficiently detecting previously fact-checked claims that uses
abstractive summarization to generate crisp queries. These queries can then be
executed on a general-purpose retrieval system associated with a collection of
previously fact-checked claims. We curate an abstractive text summarization
dataset comprising noisy claims from Twitter and their gold summaries. It is
shown that retrieval performance improves 2x by using popular out-of-the-box
summarization models and 3x by fine-tuning them on the accompanying dataset
compared to verbatim querying. Our approach achieves Recall@5 and MRR of 35%
and 0.3, compared to baseline values of 10% and 0.1, respectively. Our dataset,
code, and models are available publicly:
https://github.com/varadhbhatnagar/FC-Claim-Det/
Related papers
- Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - Correction with Backtracking Reduces Hallucination in Summarization [29.093092115901694]
Abstractive summarization aims at generating natural language summaries of a source document that are succinct while preserving the important elements.
Despite recent advances, neural text summarization models are known to be susceptible to hallucinating.
We introduce a simple yet efficient technique, CoBa, to reduce hallucination in abstractive summarization.
arXiv Detail & Related papers (2023-10-24T20:48:11Z) - MythQA: Query-Based Large-Scale Check-Worthy Claim Detection through
Multi-Answer Open-Domain Question Answering [8.70509665552136]
Check-worthy claim detection aims at providing plausible misinformation to downstream fact-checking systems or human experts to check.
Many efforts have been put into how to identify check-worthy claims from a small scale of pre-collected claims, but how to efficiently detect check-worthy claims directly from a large-scale information source, such as Twitter, remains underexplored.
We introduce MythQA, a new multi-answer open-domain question answering(QA) task that involves contradictory stance mining for query-based large-scale check-worthy claim detection.
arXiv Detail & Related papers (2023-07-21T18:35:24Z) - Knockoffs-SPR: Clean Sample Selection in Learning with Noisy Labels [56.81761908354718]
We propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels.
Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline.
We further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data.
arXiv Detail & Related papers (2023-01-02T07:13:28Z) - Efficient Few-Shot Fine-Tuning for Opinion Summarization [83.76460801568092]
Abstractive summarization models are typically pre-trained on large amounts of generic texts, then fine-tuned on tens or hundreds of thousands of annotated samples.
We show that a few-shot method based on adapters can easily store in-domain knowledge.
We show that this self-supervised adapter pre-training improves summary quality over standard fine-tuning by 2.0 and 1.3 ROUGE-L points on the Amazon and Yelp datasets.
arXiv Detail & Related papers (2022-05-04T16:38:37Z) - Factual Error Correction for Abstractive Summaries Using Entity
Retrieval [57.01193722520597]
We propose an efficient factual error correction system RFEC based on entities retrieval post-editing process.
RFEC retrieves the evidence sentences from the original document by comparing the sentences with the target summary.
Next, RFEC detects the entity-level errors in the summaries by considering the evidence sentences and substitutes the wrong entities with the accurate entities from the evidence sentences.
arXiv Detail & Related papers (2022-04-18T11:35:02Z) - Scalable Fact-checking with Human-in-the-Loop [17.1138216746642]
Intending to accelerate fact-checking, we bridge this gap by grouping similar messages and summarizing them into aggregated claims.
The results show the potential to speed up the fact-checking process by organizing and selecting representative claims from massive disorganized and redundant messages.
arXiv Detail & Related papers (2021-09-22T19:19:59Z) - Pre-training for Abstractive Document Summarization by Reinstating
Source Text [105.77348528847337]
This paper presents three pre-training objectives which allow us to pre-train a Seq2Seq based abstractive summarization model on unlabeled text.
Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines.
arXiv Detail & Related papers (2020-04-04T05:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.