Teaching Broad Reasoning Skills via Decomposition-Guided Contexts
- URL: http://arxiv.org/abs/2205.12496v1
- Date: Wed, 25 May 2022 05:13:21 GMT
- Title: Teaching Broad Reasoning Skills via Decomposition-Guided Contexts
- Authors: Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal
- Abstract summary: Question-answering datasets require a broad set of reasoning skills.
We show how to use question decompositions to teach these broad reasoning skills in a robust fashion.
- Score: 50.114651561111245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Question-answering datasets require a broad set of reasoning skills. We show
how to use question decompositions to teach language models these broad
reasoning skills in a robust fashion. Specifically, we use widely available
QDMR representations to programmatically create synthetic contexts for real
questions in six multihop reasoning datasets. These contexts are carefully
designed to avoid common reasoning shortcuts prevalent in real contexts that
prevent models from learning the right skills. This results in a pretraining
dataset, named TeaBReaC, containing 525K multihop questions (with associated
formal programs) covering about 900 reasoning patterns. We show that
pretraining standard language models (LMs) on TeaBReaC before fine-tuning them
on target datasets improves their performance by up to 13 EM points across 3
multihop QA datasets, with a 30 point gain on more complex questions. The
resulting models also demonstrate higher robustness, with a 6-11 point
improvement on two contrast sets. Furthermore, TeaBReaC pretraining
substantially improves model performance and robustness even when starting with
numeracy-aware LMs pretrained using recent methods (e.g., PReasM). Our work
thus shows how one can effectively use decomposition-guided contexts to
robustly teach multihop reasoning.
Related papers
- LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! [53.84130385074551]
Large reasoning models (LRMs) tackle complex reasoning problems by following long chain-of-thoughts (Long CoT)
We find that a Large Language model (LLM) can effectively learn Long CoT reasoning through data-efficient supervised fine-tuning (SFT) and parameter-efficient low-rank adaptation (LoRA)
With just 17k long CoT training samples, the Qwen2.5-32B-Instruct model achieves significant improvements on a wide range of math and coding benchmarks.
arXiv Detail & Related papers (2025-02-11T08:48:48Z) - STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training [87.58996020705258]
Video Large Language Models (Video-LLMs) have recently shown strong derivation in basic video understanding tasks.
Video-LLMs struggle with compositional reasoning that requires multi-step explicit-temporal inference across object relations, interactions and events.
We propose STEP, a novel graph-guided self-training method that enables VideoLLMs to generate reasoning-rich finetuning data from any raw videos to improve itself.
arXiv Detail & Related papers (2024-11-29T11:54:55Z) - Improve Vision Language Model Chain-of-thought Reasoning [86.83335752119741]
Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness.
We show that training VLM on short answers does not generalize well to reasoning tasks that require more detailed responses.
arXiv Detail & Related papers (2024-10-21T17:00:06Z) - Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification [41.330719056639616]
We study the entailment verification problem of multi-sentence premises.
Modern NLP problems, such as detecting inconsistent model-generated rationales, require complex multi-hop reasoning.
arXiv Detail & Related papers (2024-02-06T04:14:09Z) - Investigating the Efficacy of Large Language Models in Reflective
Assessment Methods through Chain of Thoughts Prompting [0.2552922646705803]
Chain of Thought(CoT) prompting method has been proposed as a means to enhance LLMs' proficiency in complex reasoning tasks.
The primary aim of this research is to assess how well four language models can grade reflective essays of third-year medical students.
arXiv Detail & Related papers (2023-09-30T06:25:27Z) - Answering Unseen Questions With Smaller Language Models Using Rationale
Generation and Dense Retrieval [9.136948771060895]
We evaluate two methods for further improvement in this setting.
Both focus on combining rationales generated by a larger Language Model with longer contexts created from a multi-hop dense retrieval system.
Our single best Reasoning model materially improves upon strong comparable prior baselines for unseen evaluation datasets.
arXiv Detail & Related papers (2023-08-09T05:06:39Z) - STREET: A Multi-Task Structured Reasoning and Explanation Benchmark [56.555662318619135]
We introduce a unified multi-task and multi-domain natural language reasoning and explanation benchmark.
We expect models to not only answer questions, but also produce step-by-step structured explanations describing how premises in the question are used to produce intermediate conclusions that can prove the correctness of a certain answer.
arXiv Detail & Related papers (2023-02-13T22:34:02Z) - How Well Do Multi-hop Reading Comprehension Models Understand Date
Information? [31.243088887839257]
The ability of multi-hop models to perform step-by-step reasoning when finding an answer to a comparison question remains unclear.
It is also unclear how questions about the internal reasoning process are useful for training and evaluating question-answering (QA) systems.
arXiv Detail & Related papers (2022-10-11T07:24:07Z) - Turning Tables: Generating Examples from Semi-structured Tables for
Endowing Language Models with Reasoning Skills [32.55545292360155]
We propose to leverage semi-structured tables, and automatically generate at scale question-paragraph pairs.
We add a pre-training step over this synthetic data, which includes examples that require 16 different reasoning skills.
We show that our model, PReasM, substantially outperforms T5, a popular pre-trained encoder-decoder model.
arXiv Detail & Related papers (2021-07-15T11:37:14Z) - Logic-Guided Data Augmentation and Regularization for Consistent
Question Answering [55.05667583529711]
This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions.
Our method leverages logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model.
arXiv Detail & Related papers (2020-04-21T17:03:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.