Teaching Broad Reasoning Skills via Decomposition-Guided Contexts
- URL: http://arxiv.org/abs/2205.12496v1
- Date: Wed, 25 May 2022 05:13:21 GMT
- Title: Teaching Broad Reasoning Skills via Decomposition-Guided Contexts
- Authors: Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal
- Abstract summary: Question-answering datasets require a broad set of reasoning skills.
We show how to use question decompositions to teach these broad reasoning skills in a robust fashion.
- Score: 50.114651561111245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Question-answering datasets require a broad set of reasoning skills. We show
how to use question decompositions to teach language models these broad
reasoning skills in a robust fashion. Specifically, we use widely available
QDMR representations to programmatically create synthetic contexts for real
questions in six multihop reasoning datasets. These contexts are carefully
designed to avoid common reasoning shortcuts prevalent in real contexts that
prevent models from learning the right skills. This results in a pretraining
dataset, named TeaBReaC, containing 525K multihop questions (with associated
formal programs) covering about 900 reasoning patterns. We show that
pretraining standard language models (LMs) on TeaBReaC before fine-tuning them
on target datasets improves their performance by up to 13 EM points across 3
multihop QA datasets, with a 30 point gain on more complex questions. The
resulting models also demonstrate higher robustness, with a 6-11 point
improvement on two contrast sets. Furthermore, TeaBReaC pretraining
substantially improves model performance and robustness even when starting with
numeracy-aware LMs pretrained using recent methods (e.g., PReasM). Our work
thus shows how one can effectively use decomposition-guided contexts to
robustly teach multihop reasoning.
Related papers
- Improve Vision Language Model Chain-of-thought Reasoning [86.83335752119741]
Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness.
We show that training VLM on short answers does not generalize well to reasoning tasks that require more detailed responses.
arXiv Detail & Related papers (2024-10-21T17:00:06Z) - Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data [89.2410799619405]
We introduce the Quantitative Reasoning with Data benchmark to evaluate Large Language Models' capability in statistical and causal reasoning with real-world data.
The benchmark comprises a dataset of 411 questions accompanied by data sheets from textbooks, online learning materials, and academic papers.
To compare models' quantitative reasoning abilities on data and text, we enrich the benchmark with an auxiliary set of 290 text-only questions, namely QRText.
arXiv Detail & Related papers (2024-02-27T16:15:03Z) - Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification [41.330719056639616]
We study the entailment verification problem of multi-sentence premises.
Modern NLP problems, such as detecting inconsistent model-generated rationales, require complex multi-hop reasoning.
arXiv Detail & Related papers (2024-02-06T04:14:09Z) - Investigating the Efficacy of Large Language Models in Reflective
Assessment Methods through Chain of Thoughts Prompting [0.2552922646705803]
Chain of Thought(CoT) prompting method has been proposed as a means to enhance LLMs' proficiency in complex reasoning tasks.
The primary aim of this research is to assess how well four language models can grade reflective essays of third-year medical students.
arXiv Detail & Related papers (2023-09-30T06:25:27Z) - Answering Unseen Questions With Smaller Language Models Using Rationale
Generation and Dense Retrieval [9.136948771060895]
We evaluate two methods for further improvement in this setting.
Both focus on combining rationales generated by a larger Language Model with longer contexts created from a multi-hop dense retrieval system.
Our single best Reasoning model materially improves upon strong comparable prior baselines for unseen evaluation datasets.
arXiv Detail & Related papers (2023-08-09T05:06:39Z) - STREET: A Multi-Task Structured Reasoning and Explanation Benchmark [56.555662318619135]
We introduce a unified multi-task and multi-domain natural language reasoning and explanation benchmark.
We expect models to not only answer questions, but also produce step-by-step structured explanations describing how premises in the question are used to produce intermediate conclusions that can prove the correctness of a certain answer.
arXiv Detail & Related papers (2023-02-13T22:34:02Z) - How Well Do Multi-hop Reading Comprehension Models Understand Date
Information? [31.243088887839257]
The ability of multi-hop models to perform step-by-step reasoning when finding an answer to a comparison question remains unclear.
It is also unclear how questions about the internal reasoning process are useful for training and evaluating question-answering (QA) systems.
arXiv Detail & Related papers (2022-10-11T07:24:07Z) - Turning Tables: Generating Examples from Semi-structured Tables for
Endowing Language Models with Reasoning Skills [32.55545292360155]
We propose to leverage semi-structured tables, and automatically generate at scale question-paragraph pairs.
We add a pre-training step over this synthetic data, which includes examples that require 16 different reasoning skills.
We show that our model, PReasM, substantially outperforms T5, a popular pre-trained encoder-decoder model.
arXiv Detail & Related papers (2021-07-15T11:37:14Z) - Generative Context Pair Selection for Multi-hop Question Answering [60.74354009152721]
We propose a generative context selection model for multi-hop question answering.
Our proposed generative passage selection model has a better performance (4.9% higher than baseline) on adversarial held-out set.
arXiv Detail & Related papers (2021-04-18T07:00:48Z) - Logic-Guided Data Augmentation and Regularization for Consistent
Question Answering [55.05667583529711]
This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions.
Our method leverages logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model.
arXiv Detail & Related papers (2020-04-21T17:03:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.