Unbiased Math Word Problems Benchmark for Mitigating Solving Bias
- URL: http://arxiv.org/abs/2205.08108v1
- Date: Tue, 17 May 2022 06:07:04 GMT
- Title: Unbiased Math Word Problems Benchmark for Mitigating Solving Bias
- Authors: Zhicheng Yang, Jinghui Qin, Jiaqi Chen, and Xiaodan Liang
- Abstract summary: Current solvers exist solving bias which consists of data bias and learning bias due to biased dataset and improper training strategy.
Our experiments verify MWP solvers are easy to be biased by the biased training datasets which do not cover diverse questions for each problem narrative of all MWPs.
An MWP can be naturally solved by multiple equivalent equations while current datasets take only one of the equivalent equations as ground truth.
- Score: 72.8677805114825
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we revisit the solving bias when evaluating models on current
Math Word Problem (MWP) benchmarks. However, current solvers exist solving bias
which consists of data bias and learning bias due to biased dataset and
improper training strategy. Our experiments verify MWP solvers are easy to be
biased by the biased training datasets which do not cover diverse questions for
each problem narrative of all MWPs, thus a solver can only learn shallow
heuristics rather than deep semantics for understanding problems. Besides, an
MWP can be naturally solved by multiple equivalent equations while current
datasets take only one of the equivalent equations as ground truth, forcing the
model to match the labeled ground truth and ignoring other equivalent
equations. Here, we first introduce a novel MWP dataset named UnbiasedMWP which
is constructed by varying the grounded expressions in our collected data and
annotating them with corresponding multiple new questions manually. Then, to
further mitigate learning bias, we propose a Dynamic Target Selection (DTS)
Strategy to dynamically select more suitable target expressions according to
the longest prefix match between the current model output and candidate
equivalent equations which are obtained by applying commutative law during
training. The results show that our UnbiasedMWP has significantly fewer biases
than its original data and other datasets, posing a promising benchmark for
fairly evaluating the solvers' reasoning skills rather than matching nearest
neighbors. And the solvers trained with our DTS achieve higher accuracies on
multiple MWP benchmarks. The source code is available at
https://github.com/yangzhch6/UnbiasedMWP.
Related papers
- Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and
Beyond [93.96982273042296]
Vision-language (VL) understanding tasks evaluate models' comprehension of complex visual scenes through multiple-choice questions.
We have identified two dataset biases that models can exploit as shortcuts to resolve various VL tasks correctly without proper understanding.
We propose Adversarial Data Synthesis (ADS) to generate synthetic training and debiased evaluation data.
We then introduce Intra-sample Counterfactual Training (ICT) to assist models in utilizing the synthesized training data, particularly the counterfactual data, via focusing on intra-sample differentiation.
arXiv Detail & Related papers (2023-10-23T08:09:42Z) - Math Word Problem Solving by Generating Linguistic Variants of Problem
Statements [1.742186232261139]
We propose a framework for MWP solvers based on the generation of linguistic variants of the problem text.
The approach involves solving each of the variant problems and electing the predicted expression with the majority of the votes.
We show that training on linguistic variants of problem statements and voting on candidate predictions improve the mathematical reasoning and robustness of the model.
arXiv Detail & Related papers (2023-06-24T08:27:39Z) - Learning by Analogy: Diverse Questions Generation in Math Word Problem [21.211970350827183]
Solving math word problem (MWP) with AI techniques has recently made great progress with the success of deep neural networks (DNN)
We argue that the ability of learning by analogy is essential for an MWP solver to better understand same problems which may typically be formulated in diverse ways.
In this paper, we make a first attempt to solve MWPs by generating diverse yet consistent questions/equations.
arXiv Detail & Related papers (2023-06-15T11:47:07Z) - SMoA: Sparse Mixture of Adapters to Mitigate Multiple Dataset Biases [27.56143777363971]
We propose a new debiasing method Sparse Mixture-of-Adapters (SMoA), which can mitigate multiple dataset biases effectively and efficiently.
Experiments on Natural Language Inference and Paraphrase Identification tasks demonstrate that SMoA outperforms full-finetuning, adapter tuning baselines, and prior strong debiasing methods.
arXiv Detail & Related papers (2023-02-28T08:47:20Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases.
First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method.
The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.