Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset
- URL: http://arxiv.org/abs/2311.05113v1
- Date: Thu, 9 Nov 2023 02:58:17 GMT
- Title: Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset
- Authors: Haoyi Wu, Wenyang Hui, Yezeng Chen, Weiqi Wu, Kewei Tu, Yi Zhou
- Abstract summary: We propose Conic10K, a challenging math problem dataset on conic sections in Chinese senior high school education.
Our dataset contains various problems with different reasoning depths, while only the knowledge from conic sections is required.
For each problem, we provide a high-quality formal representation, the reasoning steps, and the final solution.
- Score: 38.99073257782012
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mathematical understanding and reasoning are crucial tasks for assessing the
capabilities of artificial intelligence (AI). However, existing benchmarks
either require just a few steps of reasoning, or only contain a small amount of
data in one specific topic, making it hard to analyse AI's behaviour with
reference to different problems within a specific topic in detail. In this
work, we propose Conic10K, a challenging math problem dataset on conic sections
in Chinese senior high school education. Our dataset contains various problems
with different reasoning depths, while only the knowledge from conic sections
is required. Since the dataset only involves a narrow range of knowledge, it is
easy to separately analyse the knowledge a model possesses and the reasoning
ability it has. For each problem, we provide a high-quality formal
representation, the reasoning steps, and the final solution. Experiments show
that existing large language models, including GPT-4, exhibit weak performance
on complex reasoning. We hope that our findings could inspire more advanced
techniques for precise natural language understanding and reasoning. Our
dataset and codes are available at https://github.com/whyNLP/Conic10K.
Related papers
- MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula [33.5782208232163]
We propose Math CAMPS: a method to synthesize high-quality mathematical problems at scale.
We encode each standard in a formal grammar, allowing us to sample diverse symbolic problems and their answers.
We derive follow-up questions from symbolic structures and convert them into follow-up word problems.
arXiv Detail & Related papers (2024-07-01T01:56:28Z) - CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities [25.857946070979576]
Concept and Hint-Annotated Math Problems (CHAMP) consists of high school math competition problems annotated with concepts.
This benchmark is difficult, with the best model only scoring 58.1% in standard settings.
We find that models often arrive at the correct final answer through wrong reasoning steps.
arXiv Detail & Related papers (2024-01-13T03:18:16Z) - GeomVerse: A Systematic Evaluation of Large Models for Geometric
Reasoning [17.61621287003562]
We evaluate vision language models (VLMs) along various axes through the lens of geometry problems.
We procedurally create a synthetic dataset of geometry questions with controllable difficulty levels along multiple axes.
The empirical results obtained using our benchmark for state-of-the-art VLMs indicate that these models are not as capable in subjects like geometry.
arXiv Detail & Related papers (2023-12-19T15:25:39Z) - Towards a Holistic Understanding of Mathematical Questions with
Contrastive Pre-training [65.10741459705739]
We propose a novel contrastive pre-training approach for mathematical question representations, namely QuesCo.
We first design two-level question augmentations, including content-level and structure-level, which generate literally diverse question pairs with similar purposes.
Then, to fully exploit hierarchical information of knowledge concepts, we propose a knowledge hierarchy-aware rank strategy.
arXiv Detail & Related papers (2023-01-18T14:23:29Z) - ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational
Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering.
Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z) - Learn to Explain: Multimodal Reasoning via Thought Chains for Science
Question Answering [124.16250115608604]
We present Science Question Answering (SQA), a new benchmark that consists of 21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations.
We show that SQA improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA.
Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
arXiv Detail & Related papers (2022-09-20T07:04:24Z) - LogiQA: A Challenge Dataset for Machine Reading Comprehension with
Logical Reasoning [20.81312285957089]
We build a comprehensive dataset, named LogiQA, which is sourced from expert-written questions for testing human logical reasoning.
Results show that state-of-the-art neural models perform by far worse than human ceiling.
Our dataset can also serve as a benchmark for reinvestigating logical AI under the deep learning NLP setting.
arXiv Detail & Related papers (2020-07-16T05:52:16Z) - PuzzLing Machines: A Challenge on Learning From Small Data [64.513459448362]
We introduce a challenge on learning from small data, PuzzLing Machines, which consists of Rosetta Stone puzzles from Linguistic Olympiads for high school students.
Our challenge contains around 100 puzzles covering a wide range of linguistic phenomena from 81 languages.
We show that both simple statistical algorithms and state-of-the-art deep neural models perform inadequately on this challenge, as expected.
arXiv Detail & Related papers (2020-04-27T20:34:26Z) - Machine Number Sense: A Dataset of Visual Arithmetic Problems for
Abstract and Relational Reasoning [95.18337034090648]
We propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG)
These visual arithmetic problems are in the form of geometric figures.
We benchmark the MNS dataset using four predominant neural network models as baselines in this visual reasoning task.
arXiv Detail & Related papers (2020-04-25T17:14:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.