LogiQA: A Challenge Dataset for Machine Reading Comprehension with
Logical Reasoning
- URL: http://arxiv.org/abs/2007.08124v1
- Date: Thu, 16 Jul 2020 05:52:16 GMT
- Title: LogiQA: A Challenge Dataset for Machine Reading Comprehension with
Logical Reasoning
- Authors: Jian Liu, Leyang Cui, Hanmeng Liu, Dandan Huang, Yile Wang, Yue Zhang
- Abstract summary: We build a comprehensive dataset, named LogiQA, which is sourced from expert-written questions for testing human logical reasoning.
Results show that state-of-the-art neural models perform by far worse than human ceiling.
Our dataset can also serve as a benchmark for reinvestigating logical AI under the deep learning NLP setting.
- Score: 20.81312285957089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine reading is a fundamental task for testing the capability of natural
language understanding, which is closely related to human cognition in many
aspects. With the rising of deep learning techniques, algorithmic models rival
human performances on simple QA, and thus increasingly challenging machine
reading datasets have been proposed. Though various challenges such as evidence
integration and commonsense knowledge have been integrated, one of the
fundamental capabilities in human reading, namely logical reasoning, is not
fully investigated. We build a comprehensive dataset, named LogiQA, which is
sourced from expert-written questions for testing human Logical reasoning. It
consists of 8,678 QA instances, covering multiple types of deductive reasoning.
Results show that state-of-the-art neural models perform by far worse than
human ceiling. Our dataset can also serve as a benchmark for reinvestigating
logical AI under the deep learning NLP setting. The dataset is freely available
at https://github.com/lgw863/LogiQA-dataset
Related papers
- Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset [38.99073257782012]
We propose Conic10K, a challenging math problem dataset on conic sections in Chinese senior high school education.
Our dataset contains various problems with different reasoning depths, while only the knowledge from conic sections is required.
For each problem, we provide a high-quality formal representation, the reasoning steps, and the final solution.
arXiv Detail & Related papers (2023-11-09T02:58:17Z) - MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning [63.80739044622555]
We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative.
This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm.
Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
arXiv Detail & Related papers (2023-10-24T17:59:20Z) - Utilizing Background Knowledge for Robust Reasoning over Traffic
Situations [63.45021731775964]
We focus on a complementary research aspect of Intelligent Transportation: traffic understanding.
We scope our study to text-based methods and datasets given the abundant commonsense knowledge.
We adopt three knowledge-driven approaches for zero-shot QA over traffic situations.
arXiv Detail & Related papers (2022-12-04T09:17:24Z) - JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions [75.42526766746515]
We propose a new commonsense reasoning dataset based on human's Interactive Fiction (IF) gameplay walkthroughs.
Our dataset focuses on the assessment of functional commonsense knowledge rules rather than factual knowledge.
Experiments show that the introduced dataset is challenging to previous machine reading models as well as the new large language models.
arXiv Detail & Related papers (2022-10-18T19:20:53Z) - On Explainability in AI-Solutions: A Cross-Domain Survey [4.394025678691688]
In automatically deriving a system model, AI algorithms learn relations in data that are not detectable for humans.
The more complex a model, the more difficult it is for a human to understand the reasoning for the decisions.
This work provides an extensive survey of literature on this topic, which, to a large part, consists of other surveys.
arXiv Detail & Related papers (2022-10-11T06:21:47Z) - ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational
Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering.
Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z) - Understanding Unnatural Questions Improves Reasoning over Text [54.235828149899625]
Complex question answering (CQA) over raw text is a challenging task.
Learning an effective CQA model requires large amounts of human-annotated data.
We address the challenge of learning a high-quality programmer (parser) by projecting natural human-generated questions into unnatural machine-generated questions.
arXiv Detail & Related papers (2020-10-19T10:22:16Z) - ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine
Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets.
The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.