Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs
- URL: http://arxiv.org/abs/2408.12060v2
- Date: Fri, 4 Oct 2024 19:31:41 GMT
- Title: Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs
- Authors: Ronit Singhal, Pransh Patwa, Parth Patwa, Aman Chadha, Amitava Das,
- Abstract summary: We use the Averitec dataset to assess the performance of our fact-checking system.
In addition to veracity prediction, our system provides supporting evidence, which is extracted from the dataset.
Our system achieves an 'Averitec' score of 0.33, which is a 22% absolute improvement over the baseline.
- Score: 9.785096589765908
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given the widespread dissemination of misinformation on social media, implementing fact-checking mechanisms for online claims is essential. Manually verifying every claim is very challenging, underscoring the need for an automated fact-checking system. This paper presents our system designed to address this issue. We utilize the Averitec dataset (Schlichtkrull et al., 2023) to assess the performance of our fact-checking system. In addition to veracity prediction, our system provides supporting evidence, which is extracted from the dataset. We develop a Retrieve and Generate (RAG) pipeline to extract relevant evidence sentences from a knowledge base, which are then inputted along with the claim into a large language model (LLM) for classification. We also evaluate the few-shot In-Context Learning (ICL) capabilities of multiple LLMs. Our system achieves an 'Averitec' score of 0.33, which is a 22% absolute improvement over the baseline. Our Code is publicly available on https://github.com/ronit-singhal/evidence-backed-fact-checking-using-rag-and-few-shot-in-context-lea rning-with-llms.
Related papers
- Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024 [1.3923460621808879]
We show that the reasoning power of large language models (LLMs) and the retrieval power of modern search engines can be combined to automate this process.
We integrate LLMs and search under a multi-hop evidence pursuit strategy.
Our submitted system achieves.510 AVeriTeC score on the dev set and.477 AVeriTeC score on the test set.
arXiv Detail & Related papers (2024-11-08T18:25:06Z) - ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems [2.8692611791027893]
Retrieval-Augmented Generation (RAG) systems generate inaccurate responses due to the retrieval of irrelevant or loosely related information.
We propose ChunkRAG, a framework that enhances RAG systems by evaluating and filtering retrieved information at the chunk level.
arXiv Detail & Related papers (2024-10-25T14:07:53Z) - Declarative Knowledge Distillation from Large Language Models for Visual Question Answering Datasets [9.67464173044675]
Visual Question Answering (VQA) is the task of answering a question about an image.
We present an approach for declarative knowledge distillation from Large Language Models (LLMs)
Our results confirm that distilling knowledge from LLMs is in fact a promising direction besides data-driven rule learning approaches.
arXiv Detail & Related papers (2024-10-12T08:17:03Z) - Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation [19.312330150540912]
An emerging application is using Large Language Models (LLMs) to enhance retrieval-augmented generation (RAG) capabilities.
We propose FRAMES, a high-quality evaluation dataset designed to test LLMs' ability to provide factual responses.
We present baseline results demonstrating that even state-of-the-art LLMs struggle with this task, achieving 0.40 accuracy with no retrieval.
arXiv Detail & Related papers (2024-09-19T17:52:07Z) - OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs [64.25176233153657]
OpenFactCheck is an open-sourced fact-checking framework for large language models.
It allows users to easily customize an automatic fact-checking system.
It also assesses the factuality of all claims in an input document using that system.
arXiv Detail & Related papers (2024-08-06T15:49:58Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - EWEK-QA: Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems [103.91826112815384]
citation-based QA systems are suffering from two shortcomings.
They usually rely only on web as a source of extracted knowledge and adding other external knowledge sources can hamper the efficiency of the system.
We propose our enhanced web and efficient knowledge graph (KG) retrieval solution (EWEK-QA) to enrich the content of the extracted knowledge fed to the system.
arXiv Detail & Related papers (2024-06-14T19:40:38Z) - Are you still on track!? Catching LLM Task Drift with Activations [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.
We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.
We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - Self-Prompting Large Language Models for Zero-Shot Open-Domain QA [67.08732962244301]
Open-Domain Question Answering (ODQA) aims to answer questions without explicitly providing background documents.
This task becomes notably challenging in a zero-shot setting where no data is available to train tailored retrieval-reader models.
We propose a Self-Prompting framework to explicitly utilize the massive knowledge encoded in the parameters of Large Language Models.
arXiv Detail & Related papers (2022-12-16T18:23:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.