Performance Prediction for Multi-hop Questions
- URL: http://arxiv.org/abs/2308.06431v1
- Date: Sat, 12 Aug 2023 01:34:41 GMT
- Title: Performance Prediction for Multi-hop Questions
- Authors: Mohammadreza Samadi, Davood Rafiei
- Abstract summary: We propose multHP, a novel pre-retrieval method for predicting the performance of open-domain multi-hop questions.
Our evaluation shows that the proposed model is a strong predictor of the performance, outperforming traditional single-hop QPP models.
- Score: 7.388002745070808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of Query Performance Prediction (QPP) for open-domain
multi-hop Question Answering (QA), where the task is to estimate the difficulty
of evaluating a multi-hop question over a corpus. Despite the extensive
research on predicting the performance of ad-hoc and QA retrieval models, there
has been a lack of study on the estimation of the difficulty of multi-hop
questions. The problem is challenging due to the multi-step nature of the
retrieval process, potential dependency of the steps and the reasoning
involved. To tackle this challenge, we propose multHP, a novel pre-retrieval
method for predicting the performance of open-domain multi-hop questions. Our
extensive evaluation on the largest multi-hop QA dataset using several modern
QA systems shows that the proposed model is a strong predictor of the
performance, outperforming traditional single-hop QPP models. Additionally, we
demonstrate that our approach can be effectively used to optimize the
parameters of QA systems, such as the number of documents to be retrieved,
resulting in improved overall retrieval performance.
Related papers
- Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.
We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.
Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - DEXTER: A Benchmark for open-domain Complex Question Answering using LLMs [3.24692739098077]
Open-domain complex Question Answering (QA) is a difficult task with challenges in evidence retrieval and reasoning.
We evaluate state-of-the-art pre-trained dense and sparse retrieval models in an open-domain setting.
We observe that late interaction models and surprisingly lexical models like BM25 perform well compared to other pre-trained dense retrieval models.
arXiv Detail & Related papers (2024-06-24T22:09:50Z) - End-to-End Beam Retrieval for Multi-Hop Question Answering [37.13580394608824]
Multi-hop question answering involves finding multiple relevant passages and step-by-step reasoning to answer complex questions.
Previous retrievers were customized for two-hop questions, and most of them were trained separately across different hops.
We introduce Beam Retrieval, an end-to-end beam retrieval framework for multi-hop QA.
arXiv Detail & Related papers (2023-08-17T13:24:14Z) - Query Performance Prediction: From Ad-hoc to Conversational Search [55.37199498369387]
Query performance prediction (QPP) is a core task in information retrieval.
Research has shown the effectiveness and usefulness of QPP for ad-hoc search.
Despite its potential, QPP for conversational search has been little studied.
arXiv Detail & Related papers (2023-05-18T12:37:01Z) - Understanding and Improving Zero-shot Multi-hop Reasoning in Generative
Question Answering [85.79940770146557]
We decompose multi-hop questions into multiple corresponding single-hop questions.
We find marked inconsistency in QA models' answers on these pairs of ostensibly identical question chains.
When trained only on single-hop questions, models generalize poorly to multi-hop questions.
arXiv Detail & Related papers (2022-10-09T11:48:07Z) - Modeling Multi-hop Question Answering as Single Sequence Prediction [88.72621430714985]
We propose a simple generative approach (PathFid) that extends the task beyond just answer generation.
PathFid explicitly models the reasoning process to resolve the answer for multi-hop questions.
Our experiments demonstrate that PathFid leads to strong performance gains on two multi-hop QA datasets.
arXiv Detail & Related papers (2022-05-18T21:57:59Z) - Calibrating Trust of Multi-Hop Question Answering Systems with
Decompositional Probes [14.302797773412543]
Multi-hop Question Answering (QA) is a challenging task since it requires an accurate aggregation of information from multiple context paragraphs.
Recent work in multi-hop QA has shown that performance can be boosted by first decomposing the questions into simpler, single-hop questions.
We show that decomposition is an effective form of probing QA systems as well as a promising approach to explanation generation.
arXiv Detail & Related papers (2022-04-16T01:03:36Z) - Answering Any-hop Open-domain Questions with Iterative Document
Reranking [62.76025579681472]
We propose a unified QA framework to answer any-hop open-domain questions.
Our method consistently achieves performance comparable to or better than the state-of-the-art on both single-hop and multi-hop open-domain QA datasets.
arXiv Detail & Related papers (2020-09-16T04:31:38Z) - Reinforced Multi-task Approach for Multi-hop Question Generation [47.15108724294234]
We take up Multi-hop question generation, which aims at generating relevant questions based on supporting facts in the context.
We employ multitask learning with the auxiliary task of answer-aware supporting fact prediction to guide the question generator.
We demonstrate the effectiveness of our approach through experiments on the multi-hop question answering dataset, HotPotQA.
arXiv Detail & Related papers (2020-04-05T10:16:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.