Dynamic Prompt Learning via Policy Gradient for Semi-structured
Mathematical Reasoning
- URL: http://arxiv.org/abs/2209.14610v1
- Date: Thu, 29 Sep 2022 08:01:04 GMT
- Title: Dynamic Prompt Learning via Policy Gradient for Semi-structured
Mathematical Reasoning
- Authors: Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay
Rajpurohit, Peter Clark, Ashwin Kalyan
- Abstract summary: We present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 grade-level problems that require mathematical reasoning.
We evaluate different pre-trained models on TabMWP, including the GPT-3 model in a few-shot setting.
We propose a novel approach, PromptPG, which utilizes policy gradient to learn to select in-context examples from a small amount of training data.
- Score: 150.17907456113537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mathematical reasoning, a core ability of human intelligence, presents unique
challenges for machines in abstract thinking and logical reasoning. Recent
large pre-trained language models such as GPT-3 have achieved remarkable
progress on mathematical reasoning tasks written in text form, such as math
word problems (MWP). However, it is unknown if the models can handle more
complex problems that involve math reasoning over heterogeneous information,
such as tabular data. To fill the gap, we present Tabular Math Word Problems
(TabMWP), a new dataset containing 38,431 open-domain grade-level problems that
require mathematical reasoning on both textual and tabular data. Each question
in TabMWP is aligned with a tabular context, which is presented as an image,
semi-structured text, and a structured table. There are two types of questions:
free-text and multi-choice, and each problem is annotated with gold solutions
to reveal the multi-step reasoning process. We evaluate different pre-trained
models on TabMWP, including the GPT-3 model in a few-shot setting. As earlier
studies suggest, since few-shot GPT-3 relies on the selection of in-context
examples, its performance is unstable and can degrade to near chance. The
unstable issue is more severe when handling complex problems like TabMWP. To
mitigate this, we further propose a novel approach, PromptPG, which utilizes
policy gradient to learn to select in-context examples from a small amount of
training data and then constructs the corresponding prompt for the test
example. Experimental results show that our method outperforms the best
baseline by 5.31% on the accuracy metric and reduces the prediction variance
significantly compared to random selection, which verifies its effectiveness in
the selection of in-context examples.
Related papers
- Parameterizing Context: Unleashing the Power of Parameter-Efficient
Fine-Tuning and In-Context Tuning for Continual Table Semantic Parsing [13.51721352349583]
This paper introduces a novel method integrating textitcontext-efficient fine-tuning (PEFT) and textitin-adaptive tuning (ICT) for training a continual table semantic parsing.
The teacher addresses the few-shot problem using ICT, which procures contextual information by demonstrating a few training examples.
In turn, the student leverages the proposed PEFT framework to learn from the teacher's output distribution, and subsequently compresses and saves the contextual information to the prompts, eliminating the need to store any training examples.
arXiv Detail & Related papers (2023-10-07T13:40:41Z) - Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning [10.889271604723312]
Chain-of-thought (CoT) prompting with large language models has proven effective in numerous natural language processing tasks.
We investigate two approaches to leverage the training data in a few-shot prompting scenario: dynamic program prompting and program distillation.
Our experiments on three standard math word problem (MWP) datasets demonstrate the effectiveness of these approaches.
arXiv Detail & Related papers (2023-05-29T16:01:40Z) - Textual Enhanced Contrastive Learning for Solving Math Word Problems [23.196339273292246]
We propose a Textual Enhanced Contrastive Learning framework, which enforces the models to distinguish semantically similar examples.
We adopt a self-supervised manner strategy to enrich examples with subtle textual variance.
Experimental results show that our method achieves state-of-the-art on both widely used benchmark datasets and also exquisitely designed challenge datasets in English and Chinese.
arXiv Detail & Related papers (2022-11-29T08:44:09Z) - ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models [82.63962107729994]
Any-Shot Data-to-Text (ASDOT) is a new approach flexibly applicable to diverse settings.
It consists of two steps, data disambiguation and sentence fusion.
Experimental results show that ASDOT consistently achieves significant improvement over baselines.
arXiv Detail & Related papers (2022-10-09T19:17:43Z) - PTab: Using the Pre-trained Language Model for Modeling Tabular Data [5.791972449406902]
Recent studies show that neural-based models are effective in learning contextual representation for Tabular data.
We propose a novel framework PTab, using the Pre-trained language model to model Tabular data.
Our method has achieved a better average AUC score in supervised settings compared to the state-of-the-art baselines.
arXiv Detail & Related papers (2022-09-15T08:58:42Z) - Unbiased Math Word Problems Benchmark for Mitigating Solving Bias [72.8677805114825]
Current solvers exist solving bias which consists of data bias and learning bias due to biased dataset and improper training strategy.
Our experiments verify MWP solvers are easy to be biased by the biased training datasets which do not cover diverse questions for each problem narrative of all MWPs.
An MWP can be naturally solved by multiple equivalent equations while current datasets take only one of the equivalent equations as ground truth.
arXiv Detail & Related papers (2022-05-17T06:07:04Z) - Generate & Rank: A Multi-task Framework for Math Word Problems [48.99880318686938]
Math word problem (MWP) is a challenging and critical task in natural language processing.
We propose Generate & Rank, a framework based on a generative pre-trained language model.
By joint training with generation and ranking, the model learns from its own mistakes and is able to distinguish between correct and incorrect expressions.
arXiv Detail & Related papers (2021-09-07T12:21:49Z) - MWP-BERT: A Strong Baseline for Math Word Problems [47.51572465676904]
Math word problem (MWP) solving is the task of transforming a sequence of natural language problem descriptions to executable math equations.
Although recent sequence modeling MWP solvers have gained credits on the math-text contextual understanding, pre-trained language models (PLM) have not been explored for solving MWP.
We introduce MWP-BERT to obtain pre-trained token representations that capture the alignment between text description and mathematical logic.
arXiv Detail & Related papers (2021-07-28T15:28:41Z) - Language Models are Few-Shot Learners [61.36677350504291]
We show that scaling up language models greatly improves task-agnostic, few-shot performance.
We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
arXiv Detail & Related papers (2020-05-28T17:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.