Related papers: MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems

MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems

URL: http://arxiv.org/abs/2408.01779v1
Date: Sat, 3 Aug 2024 13:28:19 GMT
Title: MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems
Authors: Wenbei Xie, Donglin Liu, Haoran Yan, Wenjie Wu, Zongyang Liu,
Abstract summary: We propose an agent framework for learning to solve mathematical problems based on inductive reasoning. By emulating the human learning process of generalization of learned information, this framework has great performance in the mathematical reasoning process. Our model can be used as a personalised learning aid, thus reducing the inequality of educational resources.
Score: 0.936726079405677
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the development of artificial intelligence (AI), large language models (LLM) are widely used in many fields. However, the reasoning ability of LLM is still very limited when it comes to mathematical reasoning. Mathematics plays an important role in all aspects of human society and is a technical guarantee in the fields of healthcare, transport and aerospace, for this reason, the development of AI big language models in the field of mathematics has great potential significance. To improve the mathematical reasoning ability of large language models, we proposed an agent framework for learning to solve mathematical problems based on inductive reasoning. By emulating the human learning process of generalization of learned information and effective application of previous knowledge in new reasoning tasks, this framework has great performance in the mathematical reasoning process. It improves global accuracy over the baseline method (chain-of-thought) by 20.96% and solves 17.54% of the mathematical problems that the baseline cannot solve. Benefiting from the efficient RETRIEVAL method, our model improves the ability of large language models to efficiently use external knowledge, i.e., the mathematical computation of the model can be based on written procedures. In education, our model can be used as a personalised learning aid, thus reducing the inequality of educational resources.

Related papers

Large Language Models for Mathematical Analysis [3.7325315394927023]
This work addresses critical gaps in mathematical reasoning and contributes to advancing trustworthy AI. We developed the DEMI-MathAnalysis dataset, comprising proof-based problems from mathematical analysis topics. We also designed a guiding framework to rigorously enhance LLMs' ability to solve these problems.
arXiv Detail & Related papers (2024-12-28T20:37:55Z)
Formal Mathematical Reasoning: A New Frontier in AI [60.26950681543385]
We advocate for formal mathematical reasoning and argue that it is indispensable for advancing AI4Math to the next level. We summarize existing progress, discuss open challenges, and envision critical milestones to measure future success.
arXiv Detail & Related papers (2024-12-20T17:19:24Z)
LeanAgent: Lifelong Learning for Formal Theorem Proving [85.39415834798385]
We present LeanAgent, a novel lifelong learning framework for formal theorem proving. LeanAgent continuously generalizes to and improves on ever-expanding mathematical knowledge. It successfully proves 155 theorems previously unproved formally by humans across 23 diverse Lean repositories.
arXiv Detail & Related papers (2024-10-08T17:11:24Z)
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark [82.64129627675123]
MathBench is a new benchmark that rigorously assesses the mathematical capabilities of large language models. MathBench spans a wide range of mathematical disciplines, offering a detailed evaluation of both theoretical understanding and practical problem-solving skills.
arXiv Detail & Related papers (2024-05-20T17:52:29Z)
Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks [34.09857430966818]
We introduce an extensive mathematics dataset called "MathQuest" sourced from the 11th and 12th standard Mathematics NCERT textbooks. We conduct fine-tuning experiments with three prominent large language models: LLaMA-2, WizardMath, and MAmmoTH. Our experiments reveal that among the three models, MAmmoTH-13B emerges as the most proficient, achieving the highest level of competence in solving the presented mathematical problems.
arXiv Detail & Related papers (2024-04-19T08:45:42Z)
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning [2.9104279358536647]
We present MathSensei, a tool-augmented large language model for mathematical reasoning. We study the complementary benefits of the tools - knowledge retriever (Bing Web Search), program generator + executor (Python), and symbolic equation solver (Wolfram-Alpha API)
arXiv Detail & Related papers (2024-02-27T05:50:35Z)
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models [67.32868432113587]
This paper introduces ConceptMath, a fine-grained benchmark that evaluates concept-wise mathematical reasoning of Large Language Models (LLMs) Unlike traditional benchmarks that evaluate general mathematical reasoning with an average accuracy, ConceptMath systematically organizes math problems under a hierarchy of math concepts.
arXiv Detail & Related papers (2024-02-22T16:06:49Z)
Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics [0.0]
Beyond human-AI chat, large language models (LLMs) are emerging in programming, algorithm discovery, and theorem proving. This project introduces Math Agents and mathematical embedding as fresh entries to the "Moore's Law of Mathematics" Project aims to use Math Agents and mathematical embeddings to address the ageing issue in information systems biology.
arXiv Detail & Related papers (2023-07-04T20:16:32Z)
Evaluating Language Models for Mathematics through Interactions [116.67206980096513]
We introduce CheckMate, a prototype platform for humans to interact with and evaluate large language models (LLMs) We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics. We derive a taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness.
arXiv Detail & Related papers (2023-06-02T17:12:25Z)
A Survey of Deep Learning for Mathematical Reasoning [71.88150173381153]
We review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade. Recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning.
arXiv Detail & Related papers (2022-12-20T18:46:16Z)
Measuring Mathematical Problem Solving With the MATH Dataset [55.4376028963537]
We introduce MATH, a dataset of 12,500 challenging competition mathematics problems. Each problem has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations. We also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.
arXiv Detail & Related papers (2021-03-05T18:59:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.