Improving Math Problem Solving in Large Language Models Through Categorization and Strategy Tailoring
- URL: http://arxiv.org/abs/2411.00042v2
- Date: Sun, 17 Nov 2024 00:59:42 GMT
- Title: Improving Math Problem Solving in Large Language Models Through Categorization and Strategy Tailoring
- Authors: Amogh Akella,
- Abstract summary: We develop a machine learning model for problem categorization and show that its accuracy can be significantly improved through the creation of well-designed training datasets.
We believe that our approach works by helping reduce hallucinations in LLMs, which is a critical step toward unlocking their potential to tackle advanced mathematical problems.
- Score: 0.0
- License:
- Abstract: In this paper, we investigate how to harness large language models (LLMs) to solve mathematical problems both quickly and accurately. Specifically, we demonstrate the effectiveness of classifying problems into distinct categories and applying category-specific problem-solving strategies to enhance the math performance of LLMs. We develop a straightforward machine learning model for problem categorization and show that its accuracy can be significantly improved through the creation of well-designed training datasets. We believe that our approach works by helping reduce hallucinations in LLMs, which is a critical step toward unlocking their potential to tackle advanced mathematical problems.
Related papers
- Learning by Analogy: Enhancing Few-Shot Prompting for Math Word Problem Solving with Computational Graph-Based Retrieval [22.865124583257987]
We present how analogy from similarly structured questions can improve large language models' problem-solving capabilities.
Specifically, we rely on the retrieval of problems with similar computational graphs to the given question to serve as exemplars in the prompt.
Empirical results across six math word problem datasets demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2024-11-25T15:01:25Z) - Large Language Models for Combinatorial Optimization of Design Structure Matrix [4.513609458468522]
Combinatorial optimization (CO) is essential for improving efficiency and performance in engineering applications.
When it comes to real-world engineering problems, algorithms based on pure mathematical reasoning are limited and incapable to capture the contextual nuances necessary for optimization.
This study explores the potential of Large Language Models (LLMs) in solving engineering CO problems by leveraging their reasoning power and contextual knowledge.
arXiv Detail & Related papers (2024-11-19T15:39:51Z) - BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom's-Taxonomy-Inspired Prompts [59.83547898874152]
We introduce BloomWise, a new prompting technique, inspired by Bloom's taxonomy, to improve the performance of Large Language Models (LLMs)
The decision regarding the need to employ more sophisticated cognitive skills is based on self-evaluation performed by the LLM.
In extensive experiments across 4 popular math reasoning datasets, we have demonstrated the effectiveness of our proposed approach.
arXiv Detail & Related papers (2024-10-05T09:27:52Z) - MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data [20.31528845718877]
Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities.
This paper investigates the mathematical problem-solving capabilities of LLMs using the newly developed "MathOdyssey" dataset.
arXiv Detail & Related papers (2024-06-26T13:02:35Z) - MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models.
It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths.
It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z) - Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs [2.3020018305241337]
Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of large language models.
We propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions.
Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder.
arXiv Detail & Related papers (2024-04-11T22:19:50Z) - SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models [4.090307917818891]
We focus on integrating the Chain-of-Thought (CoT) and the Program-of-Thought (PoT) learning.
We propose a sequential learning approach, named SAAS (Solving Ability Amplification Strategy), which strategically transitions from CoT learning to PoT learning.
arXiv Detail & Related papers (2024-04-05T04:25:47Z) - SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving [64.38649623473626]
Large Language Models (LLMs) have driven substantial progress in artificial intelligence.
We propose a novel framework called textbfSEquential subtextbfGoal textbfOptimization (SEGO) to enhance LLMs' ability to solve mathematical problems.
arXiv Detail & Related papers (2023-10-19T17:56:40Z) - Automatically Correcting Large Language Models: Surveying the landscape
of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks.
A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output.
This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z) - SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models [70.5763210869525]
We introduce an expansive benchmark suite SciBench for Large Language Model (LLM)
SciBench contains a dataset featuring a range of collegiate-level scientific problems from mathematics, chemistry, and physics domains.
The results reveal that the current LLMs fall short of delivering satisfactory performance, with the best overall score of merely 43.22%.
arXiv Detail & Related papers (2023-07-20T07:01:57Z) - MinT: Boosting Generalization in Mathematical Reasoning via Multi-View
Fine-Tuning [53.90744622542961]
Reasoning in mathematical domains remains a significant challenge for small language models (LMs)
We introduce a new method that exploits existing mathematical problem datasets with diverse annotation styles.
Experimental results show that our strategy enables a LLaMA-7B model to outperform prior approaches.
arXiv Detail & Related papers (2023-07-16T05:41:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.