Generate & Rank: A Multi-task Framework for Math Word Problems
- URL: http://arxiv.org/abs/2109.03034v1
- Date: Tue, 7 Sep 2021 12:21:49 GMT
- Title: Generate & Rank: A Multi-task Framework for Math Word Problems
- Authors: Jianhao Shen, Yichun Yin, Lin Li, Lifeng Shang, Xin Jiang, Ming Zhang,
Qun Liu
- Abstract summary: Math word problem (MWP) is a challenging and critical task in natural language processing.
We propose Generate & Rank, a framework based on a generative pre-trained language model.
By joint training with generation and ranking, the model learns from its own mistakes and is able to distinguish between correct and incorrect expressions.
- Score: 48.99880318686938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Math word problem (MWP) is a challenging and critical task in natural
language processing. Many recent studies formalize MWP as a generation task and
have adopted sequence-to-sequence models to transform problem descriptions to
mathematical expressions. However, mathematical expressions are prone to minor
mistakes while the generation objective does not explicitly handle such
mistakes. To address this limitation, we devise a new ranking task for MWP and
propose Generate & Rank, a multi-task framework based on a generative
pre-trained language model. By joint training with generation and ranking, the
model learns from its own mistakes and is able to distinguish between correct
and incorrect expressions. Meanwhile, we perform tree-based disturbance
specially designed for MWP and an online update to boost the ranker. We
demonstrate the effectiveness of our proposed method on the benchmark and the
results show that our method consistently outperforms baselines in all
datasets. Particularly, in the classical Math23k, our method is 7% (78.4%
$\rightarrow$ 85.4%) higher than the state-of-the-art.
Related papers
- Solving Math Word Problems with Reexamination [27.80592576792461]
We propose a pseudo-dual (PseDual) learning scheme to model such process, which is model-agnostic.
The pseudo-dual task is specifically defined as filling the numbers in the expression back into the original word problem with numbers masked.
Our pseudo-dual learning scheme has been tested and proven effective when being equipped in several representative MWP solvers through empirical studies.
arXiv Detail & Related papers (2023-10-14T14:23:44Z) - MWPRanker: An Expression Similarity Based Math Word Problem Retriever [12.638925774492403]
Math Word Problems (MWPs) in online assessments help test the ability of the learner to make critical inferences.
We propose a tool in this work for MWP retrieval.
arXiv Detail & Related papers (2023-07-03T15:44:18Z) - Math Word Problem Solving by Generating Linguistic Variants of Problem
Statements [1.742186232261139]
We propose a framework for MWP solvers based on the generation of linguistic variants of the problem text.
The approach involves solving each of the variant problems and electing the predicted expression with the majority of the votes.
We show that training on linguistic variants of problem statements and voting on candidate predictions improve the mathematical reasoning and robustness of the model.
arXiv Detail & Related papers (2023-06-24T08:27:39Z) - Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning [10.889271604723312]
Chain-of-thought (CoT) prompting with large language models has proven effective in numerous natural language processing tasks.
We investigate two approaches to leverage the training data in a few-shot prompting scenario: dynamic program prompting and program distillation.
Our experiments on three standard math word problem (MWP) datasets demonstrate the effectiveness of these approaches.
arXiv Detail & Related papers (2023-05-29T16:01:40Z) - Dynamic Prompt Learning via Policy Gradient for Semi-structured
Mathematical Reasoning [150.17907456113537]
We present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 grade-level problems that require mathematical reasoning.
We evaluate different pre-trained models on TabMWP, including the GPT-3 model in a few-shot setting.
We propose a novel approach, PromptPG, which utilizes policy gradient to learn to select in-context examples from a small amount of training data.
arXiv Detail & Related papers (2022-09-29T08:01:04Z) - Unbiased Math Word Problems Benchmark for Mitigating Solving Bias [72.8677805114825]
Current solvers exist solving bias which consists of data bias and learning bias due to biased dataset and improper training strategy.
Our experiments verify MWP solvers are easy to be biased by the biased training datasets which do not cover diverse questions for each problem narrative of all MWPs.
An MWP can be naturally solved by multiple equivalent equations while current datasets take only one of the equivalent equations as ground truth.
arXiv Detail & Related papers (2022-05-17T06:07:04Z) - Unified Multimodal Pre-training and Prompt-based Tuning for
Vision-Language Understanding and Generation [86.26522210882699]
We propose Unified multimodal pre-training for both Vision-Language understanding and generation.
The proposed UniVL is capable of handling both understanding tasks and generative tasks.
Our experiments show that there is a trade-off between understanding tasks and generation tasks while using the same model.
arXiv Detail & Related papers (2021-12-10T14:59:06Z) - Math Word Problem Generation with Mathematical Consistency and Problem
Context Constraints [37.493809561634386]
We study the problem of generating arithmetic math word problems (MWPs) given a math equation.
Existing approaches are prone to generating MWPs that are mathematically invalid or have unsatisfactory language quality.
arXiv Detail & Related papers (2021-09-09T20:24:25Z) - MWP-BERT: A Strong Baseline for Math Word Problems [47.51572465676904]
Math word problem (MWP) solving is the task of transforming a sequence of natural language problem descriptions to executable math equations.
Although recent sequence modeling MWP solvers have gained credits on the math-text contextual understanding, pre-trained language models (PLM) have not been explored for solving MWP.
We introduce MWP-BERT to obtain pre-trained token representations that capture the alignment between text description and mathematical logic.
arXiv Detail & Related papers (2021-07-28T15:28:41Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.