Evaluating the Rationale Understanding of Critical Reasoning in Logical
Reading Comprehension
- URL: http://arxiv.org/abs/2311.18353v1
- Date: Thu, 30 Nov 2023 08:44:55 GMT
- Title: Evaluating the Rationale Understanding of Critical Reasoning in Logical
Reading Comprehension
- Authors: Akira Kawabata, Saku Sugawara
- Abstract summary: We crowdsource rationale texts that explain why we should select or eliminate answer options from a logical reading comprehension dataset.
Experiments show that recent large language models (e.g., InstructGPT) struggle to answer the subquestions even if they are able to answer the main questions correctly.
- Score: 13.896697187967547
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To precisely evaluate a language model's capability for logical reading
comprehension, we present a dataset for testing the understanding of the
rationale behind critical reasoning. For questions taken from an existing
multiplechoice logical reading comprehension dataset, we crowdsource rationale
texts that explain why we should select or eliminate answer options, resulting
in 3,003 multiple-choice subquestions that are associated with 943 main
questions. Experiments on our dataset show that recent large language models
(e.g., InstructGPT) struggle to answer the subquestions even if they are able
to answer the main questions correctly. We find that the models perform
particularly poorly in answering subquestions written for the incorrect options
of the main questions, implying that the models have a limited capability for
explaining why incorrect alternatives should be eliminated. These results
suggest that our dataset encourages further investigation into the critical
reasoning ability of language models while focusing on the elimination process
of relevant alternatives.
Related papers
- Getting MoRE out of Mixture of Language Model Reasoning Experts [71.61176122960464]
We propose a Mixture-of-Reasoning-Experts (MoRE) framework that ensembles diverse specialized language models.
We specialize the backbone language model with prompts optimized for different reasoning categories, including factual, multihop, mathematical, and commonsense reasoning.
Our human study confirms that presenting expert predictions and the answer selection process helps annotators more accurately calibrate when to trust the system's output.
arXiv Detail & Related papers (2023-05-24T02:00:51Z) - STREET: A Multi-Task Structured Reasoning and Explanation Benchmark [56.555662318619135]
We introduce a unified multi-task and multi-domain natural language reasoning and explanation benchmark.
We expect models to not only answer questions, but also produce step-by-step structured explanations describing how premises in the question are used to produce intermediate conclusions that can prove the correctness of a certain answer.
arXiv Detail & Related papers (2023-02-13T22:34:02Z) - APOLLO: A Simple Approach for Adaptive Pretraining of Language Models
for Logical Reasoning [73.3035118224719]
We propose APOLLO, an adaptively pretrained language model that has improved logical reasoning abilities.
APOLLO performs comparably on ReClor and outperforms baselines on LogiQA.
arXiv Detail & Related papers (2022-12-19T07:40:02Z) - Learn to Explain: Multimodal Reasoning via Thought Chains for Science
Question Answering [124.16250115608604]
We present Science Question Answering (SQA), a new benchmark that consists of 21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations.
We show that SQA improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA.
Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
arXiv Detail & Related papers (2022-09-20T07:04:24Z) - Inferring Implicit Relations with Language Models [38.70860544044594]
We investigate why current models struggle with implicit reasoning question answering tasks.
We construct a benchmark, IMPLICITRELATIONS, where given a question, a model should output a list of concept-relation pairs.
Using IMPLICITRELATIONS, we evaluate models from the GPT-3 family and find that, while these models struggle on the implicit reasoning QA task, they often succeed at inferring implicit relations.
arXiv Detail & Related papers (2022-04-28T21:00:54Z) - ListReader: Extracting List-form Answers for Opinion Questions [18.50111430378249]
ListReader is a neural ex-tractive QA model for list-form answer.
In addition to learning the alignment between the question and content, we introduce a heterogeneous graph neural network.
Our model adopts a co-extraction setting that can extract either span- or sentence-level answers.
arXiv Detail & Related papers (2021-10-22T10:33:08Z) - Generative Context Pair Selection for Multi-hop Question Answering [60.74354009152721]
We propose a generative context selection model for multi-hop question answering.
Our proposed generative passage selection model has a better performance (4.9% higher than baseline) on adversarial held-out set.
arXiv Detail & Related papers (2021-04-18T07:00:48Z) - Discrete Reasoning Templates for Natural Language Understanding [79.07883990966077]
We present an approach that reasons about complex questions by decomposing them to simpler subquestions.
We derive the final answer according to instructions in a predefined reasoning template.
We show that our approach is competitive with the state-of-the-art while being interpretable and requires little supervision.
arXiv Detail & Related papers (2021-04-05T18:56:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.