Generative Context Pair Selection for Multi-hop Question Answering
- URL: http://arxiv.org/abs/2104.08744v1
- Date: Sun, 18 Apr 2021 07:00:48 GMT
- Title: Generative Context Pair Selection for Multi-hop Question Answering
- Authors: Dheeru Dua, Cicero Nogueira dos Santos, Patrick Ng, Ben Athiwaratkun,
Bing Xiang, Matt Gardner, Sameer Singh
- Abstract summary: We propose a generative context selection model for multi-hop question answering.
Our proposed generative passage selection model has a better performance (4.9% higher than baseline) on adversarial held-out set.
- Score: 60.74354009152721
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Compositional reasoning tasks like multi-hop question answering, require
making latent decisions to get the final answer, given a question. However,
crowdsourced datasets often capture only a slice of the underlying task
distribution, which can induce unanticipated biases in models performing
compositional reasoning. Furthermore, discriminatively trained models exploit
such biases to get a better held-out performance, without learning the right
way to reason, as they do not necessitate paying attention to the question
representation (conditioning variable) in its entirety, to estimate the answer
likelihood. In this work, we propose a generative context selection model for
multi-hop question answering that reasons about how the given question could
have been generated given a context pair. While being comparable to the
state-of-the-art answering performance, our proposed generative passage
selection model has a better performance (4.9% higher than baseline) on
adversarial held-out set which tests robustness of model's multi-hop reasoning
capabilities.
Related papers
- More Bias, Less Bias: BiasPrompting for Enhanced Multiple-Choice Question Answering [53.09478307383865]
We introduce BiasPrompting, a novel inference framework for large language models (LLMs)<n>It guides LLMs to generate and critically evaluate reasoning across all plausible answer options before reaching a final prediction.<n>It demonstrates significant improvements in five widely used multiple-choice question answering benchmarks.
arXiv Detail & Related papers (2025-11-25T09:01:08Z) - Finding Answers in Thought Matters: Revisiting Evaluation on Large Language Models with Reasoning [23.867629719024325]
We propose a basic framework: Answer Regeneration.<n>The method uses an additional model inference, providing the prior input and output prefaced by the prompt "Answer:"<n>We show that this extraction-rule-agnostic approach exhibits improved performance and enhanced robustness.
arXiv Detail & Related papers (2025-10-16T15:09:22Z) - Answer-Consistent Chain-of-thought Reinforcement Learning For Multi-modal Large Langauge Models [33.398631680508814]
We propose Answer-Consistent Reinforcement Learning that modifies the GRPO algorithm with an auxiliary consistency check.<n>We design a consistency-verification reward that grants a high reward only if both the original and the post-shuffle answers agree and are correct.<n>We evaluate ACRE on challenging Video Reasoning benchmarks and multimodal math reasoning benchmarks, achieving an average 2.2% and 1.5% improvement.
arXiv Detail & Related papers (2025-10-11T08:32:52Z) - Boosting Process-Correct CoT Reasoning by Modeling Solvability of Multiple-Choice QA [10.122669382758122]
We show that when questions are effectively unsolvable for a model, spurious chains of thought (CoTs) are more likely to appear.<n>We adapt outcome-supervised reward models and reinforcement learning with group-relative advantage to incorporate solvability into their objectives.<n>Our results highlight solvability as a key factor for reducing hallucinations and increasing reliability in CoT reasoning.
arXiv Detail & Related papers (2025-09-30T08:34:16Z) - Differentiating Choices via Commonality for Multiple-Choice Question Answering [54.04315943420376]
Multiple-choice question answering can provide valuable clues for choosing the right answer.
Existing models often rank each choice separately, overlooking the context provided by other choices.
We propose a novel model by differentiating choices through identifying and eliminating their commonality, called DCQA.
arXiv Detail & Related papers (2024-08-21T12:05:21Z) - STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering [8.525847131940031]
Multi-hop question answering (MHQA) requires a model to retrieve and integrate information from multiple passages to answer a complex question.
Recent systems leverage the power of large language models and integrate evidence retrieval with reasoning prompts.
We propose STOC-TOT, a tree-of-thought reasoning prompting method with constrained decoding for MHQA.
arXiv Detail & Related papers (2024-07-04T07:17:53Z) - Getting MoRE out of Mixture of Language Model Reasoning Experts [71.61176122960464]
We propose a Mixture-of-Reasoning-Experts (MoRE) framework that ensembles diverse specialized language models.
We specialize the backbone language model with prompts optimized for different reasoning categories, including factual, multihop, mathematical, and commonsense reasoning.
Our human study confirms that presenting expert predictions and the answer selection process helps annotators more accurately calibrate when to trust the system's output.
arXiv Detail & Related papers (2023-05-24T02:00:51Z) - Understanding and Improving Zero-shot Multi-hop Reasoning in Generative
Question Answering [85.79940770146557]
We decompose multi-hop questions into multiple corresponding single-hop questions.
We find marked inconsistency in QA models' answers on these pairs of ostensibly identical question chains.
When trained only on single-hop questions, models generalize poorly to multi-hop questions.
arXiv Detail & Related papers (2022-10-09T11:48:07Z) - Measuring and Narrowing the Compositionality Gap in Language Models [116.5228850227024]
We measure how often models can correctly answer all sub-problems but not generate the overall solution.
We present a new method, self-ask, that further improves on chain of thought.
arXiv Detail & Related papers (2022-10-07T06:50:23Z) - Prompting Contrastive Explanations for Commonsense Reasoning Tasks [74.7346558082693]
Large pretrained language models (PLMs) can achieve near-human performance on commonsense reasoning tasks.
We show how to use these same models to generate human-interpretable evidence.
arXiv Detail & Related papers (2021-06-12T17:06:13Z) - A Semantic-based Method for Unsupervised Commonsense Question Answering [40.18557352036813]
Unsupervised commonsense question answering is appealing since it does not rely on any labeled task data.
We present a novel SEmantic-based Question Answering method (SEQA) for unsupervised commonsense question answering.
arXiv Detail & Related papers (2021-05-31T08:21:52Z) - Reinforced Multi-task Approach for Multi-hop Question Generation [47.15108724294234]
We take up Multi-hop question generation, which aims at generating relevant questions based on supporting facts in the context.
We employ multitask learning with the auxiliary task of answer-aware supporting fact prediction to guide the question generator.
We demonstrate the effectiveness of our approach through experiments on the multi-hop question answering dataset, HotPotQA.
arXiv Detail & Related papers (2020-04-05T10:16:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.