Adversarial Examples for Evaluating Math Word Problem Solvers
- URL: http://arxiv.org/abs/2109.05925v1
- Date: Mon, 13 Sep 2021 12:47:40 GMT
- Title: Adversarial Examples for Evaluating Math Word Problem Solvers
- Authors: Vivek Kumar, Rishabh Maheshwary, Vikram Pudi
- Abstract summary: Math Word Problem (MWP) solvers have achieved high performance on benchmark datasets.
The extent to which existing MWP solvers truly understand language and its relation with numbers is still unclear.
We generate adversarial attacks to evaluate the robustness of state-of-the-art MWP solvers.
- Score: 4.266990593059533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Standard accuracy metrics have shown that Math Word Problem (MWP) solvers
have achieved high performance on benchmark datasets. However, the extent to
which existing MWP solvers truly understand language and its relation with
numbers is still unclear. In this paper, we generate adversarial attacks to
evaluate the robustness of state-of-the-art MWP solvers. We propose two methods
Question Reordering and Sentence Paraphrasing to generate adversarial attacks.
We conduct experiments across three neural MWP solvers over two benchmark
datasets. On average, our attack method is able to reduce the accuracy of MWP
solvers by over 40 percentage points on these datasets. Our results demonstrate
that existing MWP solvers are sensitive to linguistic variations in the problem
text. We verify the validity and quality of generated adversarial examples
through human evaluation.
Related papers
- Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions [48.251724997889184]
We develop a benchmark called Problems with Missing and Contradictory conditions (PMC)
We introduce two novel metrics to evaluate the performance of few-shot prompting methods in these scenarios.
We propose a novel few-shot prompting method called SMT-LIB Prompting (SLP), which utilizes the SMT-LIB language to model the problems instead of solving them directly.
arXiv Detail & Related papers (2024-06-07T16:24:12Z) - Cutting Through the Noise: Boosting LLM Performance on Math Word Problems [52.99006895757801]
Large Language Models excel at solving math word problems, but struggle with real-world problems containing irrelevant information.
We propose a prompting framework that generates adversarial variants of MWPs by adding irrelevant variables.
Fine-tuning on adversarial training instances improves performance on adversarial MWPs by 8%.
arXiv Detail & Related papers (2024-05-30T18:07:13Z) - Machine Translation Meta Evaluation through Translation Accuracy
Challenge Sets [92.38654521870444]
We introduce ACES, a contrastive challenge set spanning 146 language pairs.
This dataset aims to discover whether metrics can identify 68 translation accuracy errors.
We conduct a large-scale study by benchmarking ACES on 50 metrics submitted to the WMT 2022 and 2023 metrics shared tasks.
arXiv Detail & Related papers (2024-01-29T17:17:42Z) - MWPRanker: An Expression Similarity Based Math Word Problem Retriever [12.638925774492403]
Math Word Problems (MWPs) in online assessments help test the ability of the learner to make critical inferences.
We propose a tool in this work for MWP retrieval.
arXiv Detail & Related papers (2023-07-03T15:44:18Z) - Unbiased Math Word Problems Benchmark for Mitigating Solving Bias [72.8677805114825]
Current solvers exist solving bias which consists of data bias and learning bias due to biased dataset and improper training strategy.
Our experiments verify MWP solvers are easy to be biased by the biased training datasets which do not cover diverse questions for each problem narrative of all MWPs.
An MWP can be naturally solved by multiple equivalent equations while current datasets take only one of the equivalent equations as ground truth.
arXiv Detail & Related papers (2022-05-17T06:07:04Z) - Math Word Problem Generation with Mathematical Consistency and Problem
Context Constraints [37.493809561634386]
We study the problem of generating arithmetic math word problems (MWPs) given a math equation.
Existing approaches are prone to generating MWPs that are mathematically invalid or have unsatisfactory language quality.
arXiv Detail & Related papers (2021-09-09T20:24:25Z) - Generate & Rank: A Multi-task Framework for Math Word Problems [48.99880318686938]
Math word problem (MWP) is a challenging and critical task in natural language processing.
We propose Generate & Rank, a framework based on a generative pre-trained language model.
By joint training with generation and ranking, the model learns from its own mistakes and is able to distinguish between correct and incorrect expressions.
arXiv Detail & Related papers (2021-09-07T12:21:49Z) - MWP-BERT: A Strong Baseline for Math Word Problems [47.51572465676904]
Math word problem (MWP) solving is the task of transforming a sequence of natural language problem descriptions to executable math equations.
Although recent sequence modeling MWP solvers have gained credits on the math-text contextual understanding, pre-trained language models (PLM) have not been explored for solving MWP.
We introduce MWP-BERT to obtain pre-trained token representations that capture the alignment between text description and mathematical logic.
arXiv Detail & Related papers (2021-07-28T15:28:41Z) - Are NLP Models really able to Solve Simple Math Word Problems? [7.433931244705934]
We show that MWP solvers that do not have access to the question asked in the MWP can still solve a large fraction of MWPs.
We introduce a challenge dataset, SVAMP, created by applying carefully chosen variations over sampled from existing datasets.
The best accuracy achieved by state-of-the-art models is substantially lower on SVAMP, thus showing that much remains to be done even for the simplest of the MWPs.
arXiv Detail & Related papers (2021-03-12T10:23:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.