Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024
- URL: http://arxiv.org/abs/2411.05762v1
- Date: Fri, 08 Nov 2024 18:25:06 GMT
- Title: Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024
- Authors: Christopher Malon,
- Abstract summary: We show that the reasoning power of large language models (LLMs) and the retrieval power of modern search engines can be combined to automate this process.
We integrate LLMs and search under a multi-hop evidence pursuit strategy.
Our submitted system achieves.510 AVeriTeC score on the dev set and.477 AVeriTeC score on the test set.
- Score: 1.3923460621808879
- License:
- Abstract: Separating disinformation from fact on the web has long challenged both the search and the reasoning powers of humans. We show that the reasoning power of large language models (LLMs) and the retrieval power of modern search engines can be combined to automate this process and explainably verify claims. We integrate LLMs and search under a multi-hop evidence pursuit strategy. This strategy generates an initial question based on an input claim using a sequence to sequence model, searches and formulates an answer to the question, and iteratively generates follow-up questions to pursue the evidence that is missing using an LLM. We demonstrate our system on the FEVER 2024 (AVeriTeC) shared task. Compared to a strategy of generating all the questions at once, our method obtains .045 higher label accuracy and .155 higher AVeriTeC score (evaluating the adequacy of the evidence). Through ablations, we show the importance of various design choices, such as the question generation method, medium-sized context, reasoning with one document at a time, adding metadata, paraphrasing, reducing the problem to two classes, and reconsidering the final verdict. Our submitted system achieves .510 AVeriTeC score on the dev set and .477 AVeriTeC score on the test set.
Related papers
- Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [102.31558123570437]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs)
We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z) - Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context [31.091013417498825]
We propose a simple yet effective method called context repetition (CoRe)
CoRe involves prompting the model by repeatedly presenting the context to ensure the supporting documents are presented in the optimal order for the model.
We improve the F1 score by up to 30%p on multi-hop QA tasks and increase accuracy by up to 70%p on a synthetic task.
arXiv Detail & Related papers (2024-10-09T17:41:53Z) - Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs [9.785096589765908]
We use the Averitec dataset to assess the performance of our fact-checking system.
In addition to veracity prediction, our system provides supporting evidence, which is extracted from the dataset.
Our system achieves an 'Averitec' score of 0.33, which is a 22% absolute improvement over the baseline.
arXiv Detail & Related papers (2024-08-22T01:42:34Z) - Multi-LLM QA with Embodied Exploration [55.581423861790945]
We investigate the use of Multi-Embodied LLM Explorers (MELE) for question-answering in an unknown environment.
Multiple LLM-based agents independently explore and then answer queries about a household environment.
We analyze different aggregation methods to generate a single, final answer for each query.
arXiv Detail & Related papers (2024-06-16T12:46:40Z) - Venn Diagram Prompting : Accelerating Comprehension with Scaffolding Effect [0.0]
We introduce Venn Diagram (VD) Prompting, an innovative prompting technique which allows Large Language Models (LLMs) to combine and synthesize information across documents.
Our proposed technique also aims to eliminate the inherent position bias in the LLMs, enhancing consistency in answers by removing sensitivity to the sequence of input information.
In the experiments performed on four public benchmark question-answering datasets, VD prompting continually matches or surpasses the performance of a meticulously crafted instruction prompt.
arXiv Detail & Related papers (2024-06-08T06:27:26Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Adaptive Information Seeking for Open-Domain Question Answering [61.39330982757494]
We propose a novel adaptive information-seeking strategy for open-domain question answering, namely AISO.
According to the learned policy, AISO could adaptively select a proper retrieval action to seek the missing evidence at each step.
AISO outperforms all baseline methods with predefined strategies in terms of both retrieval and answer evaluations.
arXiv Detail & Related papers (2021-09-14T15:08:13Z) - A Clarifying Question Selection System from NTES_ALONG in Convai3
Challenge [8.656503175492375]
This paper presents the participation of NetEase Game AI Lab team for the ClariQ challenge at Search-oriented Conversational AI (SCAI) EMNLP workshop in 2020.
The challenge asks for a complete conversational information retrieval system that can understanding and generating clarification questions.
We propose a clarifying question selection system which consists of response understanding, candidate question recalling and clarifying question ranking.
arXiv Detail & Related papers (2020-10-27T11:22:53Z) - Query Resolution for Conversational Search with Limited Supervision [63.131221660019776]
We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers.
We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC.
arXiv Detail & Related papers (2020-05-24T11:37:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.