Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning
- URL: http://arxiv.org/abs/2305.18170v2
- Date: Fri, 9 Jun 2023 05:00:30 GMT
- Title: Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning
- Authors: Zhanming Jie, Wei Lu
- Abstract summary: Chain-of-thought (CoT) prompting with large language models has proven effective in numerous natural language processing tasks.
We investigate two approaches to leverage the training data in a few-shot prompting scenario: dynamic program prompting and program distillation.
Our experiments on three standard math word problem (MWP) datasets demonstrate the effectiveness of these approaches.
- Score: 10.889271604723312
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Chain-of-thought (CoT) prompting with large language models has proven
effective in numerous natural language processing tasks, but designing prompts
that generalize well to diverse problem types can be challenging, especially in
the context of math word problem (MWP) solving. Additionally, it is common to
have a large amount of training data that have a better diversity coverage but
CoT annotations are not available, which limits the use of supervised learning
techniques. To address these issues, we investigate two approaches to leverage
the training data in a few-shot prompting scenario: dynamic program prompting
and program distillation. Our approach is largely inspired by Gao et al.,
(2022), where they proposed to replace the CoT with the programs as the
intermediate reasoning step. Such a prompting strategy allows us to accurately
verify the answer correctness through program execution in MWP solving. Our
dynamic program prompting involves annotating the training data by sampling
correct programs from a large language model, while program distillation
involves adapting a smaller model to the program-annotated training data. Our
experiments on three standard MWP datasets demonstrate the effectiveness of
these approaches, yielding significant improvements over previous baselines for
prompting and fine-tuning. Our results suggest that leveraging a large amount
of training data can improve the generalization ability of prompts and boost
the performance of fine-tuned small models in MWP solving.
Related papers
- Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - Data Augmentation with In-Context Learning and Comparative Evaluation in Math Word Problem Solving [0.0]
This study aims to provide MWP solvers with a more diverse training set, ultimately improving their ability to solve various math problems.
We propose several methods for data augmentation by modifying the problem texts and equations, such as synonym replacement, rule-based: question replacement, and rule based: reversing question methodologies.
This study extends by introducing a new in-context learning augmentation method, employing the Llama-7b language model.
arXiv Detail & Related papers (2024-04-05T07:57:03Z) - From Large to Tiny: Distilling and Refining Mathematical Expertise for Math Word Problems with Weakly Supervision [12.023661884821554]
We introduce an innovative two-stage framework that adeptly transfers mathematical Expertise from large to tiny language models.
Our method fully leverages the semantic understanding capabilities during the searching 'problem-equation' pair.
It demonstrates significantly improved performance on the Math23K and Weak12K datasets compared to existing small model methods.
arXiv Detail & Related papers (2024-03-21T13:29:54Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Information Association for Language Model Updating by Mitigating
LM-Logical Discrepancy [68.31760483418901]
Large Language Models(LLMs) struggle with providing current information due to the outdated pre-training data.
Existing methods for updating LLMs, such as knowledge editing and continual fine-tuning, have significant drawbacks in generalizability of new information.
We identify the core challenge behind these drawbacks: the LM-logical discrepancy featuring the difference between language modeling probabilities and logical probabilities.
arXiv Detail & Related papers (2023-05-29T19:48:37Z) - OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs.
Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z) - Improving Multi-task Learning via Seeking Task-based Flat Regions [38.28600737969538]
Multi-Task Learning (MTL) is a powerful learning paradigm for training deep neural networks that allows learning more than one objective by a single backbone.
There is an emerging line of work in MTL that focuses on manipulating the task gradient to derive an ultimate gradient descent direction.
We propose to leverage a recently introduced training method, named Sharpness-aware Minimization, which can enhance model generalization ability on single-task learning.
arXiv Detail & Related papers (2022-11-24T17:19:30Z) - CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented
Dialog Systems [56.302581679816775]
This paper proposes Comprehensive Instruction (CINS) that exploits PLMs with task-specific instructions.
We design a schema (definition, constraint, prompt) of instructions and their customized realizations for three important downstream tasks in ToD.
Experiments are conducted on these ToD tasks in realistic few-shot learning scenarios with small validation data.
arXiv Detail & Related papers (2021-09-10T03:23:06Z) - WARM: A Weakly (+Semi) Supervised Model for Solving Math word Problems [21.501567886241087]
Solving math word problems (MWPs) is an important and challenging problem in natural language processing.
We propose a weakly supervised model for solving MWPs by requiring only the final answer as supervision.
We demonstrate that our approach achieves accuracy gains of 4.5% and 32% over the state-of-the-art weakly supervised approach.
arXiv Detail & Related papers (2021-04-14T09:25:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.