What Makes Math Word Problems Challenging for LLMs?
- URL: http://arxiv.org/abs/2403.11369v2
- Date: Mon, 1 Apr 2024 13:58:34 GMT
- Title: What Makes Math Word Problems Challenging for LLMs?
- Authors: KV Aditya Srivatsa, Ekaterina Kochmar,
- Abstract summary: We conduct an in-depth analysis of the key linguistic and mathematical characteristics of math word problems (MWPs)
We train feature-based classifiers to better understand the impact of each feature on the overall difficulty of MWPs for prominent large language models (LLMs)
- Score: 5.153388971862429
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the question of what makes math word problems (MWPs) in English challenging for large language models (LLMs). We conduct an in-depth analysis of the key linguistic and mathematical characteristics of MWPs. In addition, we train feature-based classifiers to better understand the impact of each feature on the overall difficulty of MWPs for prominent LLMs and investigate whether this helps predict how well LLMs fare against specific categories of MWPs.
Related papers
- Not All LLM Reasoners Are Created Equal [58.236453890457476]
We study the depth of grade-school math problem-solving capabilities of LLMs.
We evaluate their performance on pairs of existing math word problems together.
arXiv Detail & Related papers (2024-10-02T17:01:10Z) - Cutting Through the Noise: Boosting LLM Performance on Math Word Problems [52.99006895757801]
Large Language Models excel at solving math word problems, but struggle with real-world problems containing irrelevant information.
We propose a prompting framework that generates adversarial variants of MWPs by adding irrelevant variables.
Fine-tuning on adversarial training instances improves performance on adversarial MWPs by 8%.
arXiv Detail & Related papers (2024-05-30T18:07:13Z) - Can LLMs Solve longer Math Word Problems Better? [47.227621867242]
Math Word Problems (MWPs) are crucial for evaluating the capability of Large Language Models (LLMs)
This study pioneers the exploration of Context Length Generalizability (CoLeG)
Two novel metrics are proposed to assess the efficacy and resilience of LLMs in solving these problems.
arXiv Detail & Related papers (2024-05-23T17:13:50Z) - Benchmarking Hallucination in Large Language Models based on
Unanswerable Math Word Problem [58.3723958800254]
Large language models (LLMs) are highly effective in various natural language processing (NLP) tasks.
They are susceptible to producing unreliable conjectures in ambiguous contexts called hallucination.
This paper presents a new method for evaluating LLM hallucination in Question Answering (QA) based on the unanswerable math word problem (MWP)
arXiv Detail & Related papers (2024-03-06T09:06:34Z) - Learning to Generate Explainable Stock Predictions using Self-Reflective
Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions.
A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations.
Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z) - MWP-BERT: A Strong Baseline for Math Word Problems [47.51572465676904]
Math word problem (MWP) solving is the task of transforming a sequence of natural language problem descriptions to executable math equations.
Although recent sequence modeling MWP solvers have gained credits on the math-text contextual understanding, pre-trained language models (PLM) have not been explored for solving MWP.
We introduce MWP-BERT to obtain pre-trained token representations that capture the alignment between text description and mathematical logic.
arXiv Detail & Related papers (2021-07-28T15:28:41Z) - A Diverse Corpus for Evaluating and Developing English Math Word Problem
Solvers [10.244215079409797]
We present ASDiv, a diverse (in terms of both language patterns and problem types) English math word problem (MWP) corpus.
Existing MWP corpora for studying AI progress remain limited either in language usage patterns or in problem types.
We thus present a new English MWP corpus with 2,305 MWPs that cover more text patterns and most problem types taught in elementary school.
arXiv Detail & Related papers (2021-06-30T01:54:11Z) - Are NLP Models really able to Solve Simple Math Word Problems? [7.433931244705934]
We show that MWP solvers that do not have access to the question asked in the MWP can still solve a large fraction of MWPs.
We introduce a challenge dataset, SVAMP, created by applying carefully chosen variations over sampled from existing datasets.
The best accuracy achieved by state-of-the-art models is substantially lower on SVAMP, thus showing that much remains to be done even for the simplest of the MWPs.
arXiv Detail & Related papers (2021-03-12T10:23:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.