Let GPT be a Math Tutor: Teaching Math Word Problem Solvers with
Customized Exercise Generation
- URL: http://arxiv.org/abs/2305.14386v1
- Date: Mon, 22 May 2023 17:36:14 GMT
- Title: Let GPT be a Math Tutor: Teaching Math Word Problem Solvers with
Customized Exercise Generation
- Authors: Zhenwen Liang, Wenhao Yu, Tanmay Rajpurohit, Peter Clark, Xiangliang
Zhang, Ashwin Kaylan
- Abstract summary: We present a novel approach for distilling math word problem solving capabilities from large language models (LLMs) into smaller, more efficient student models.
Our approach is designed to consider the student model's weaknesses and foster a tailored learning experience by generating targeted exercises aligned with educational science principles.
- Score: 39.282695549919495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a novel approach for distilling math word problem
solving capabilities from large language models (LLMs) into smaller, more
efficient student models. Our approach is designed to consider the student
model's weaknesses and foster a tailored learning experience by generating
targeted exercises aligned with educational science principles, such as
knowledge tracing and personalized learning. Concretely, we let GPT-3 be a math
tutor and run two steps iteratively: 1) assessing the student model's current
learning status on a GPT-generated exercise book, and 2) improving the student
model by training it with tailored exercise samples generated by GPT-3.
Experimental results reveal that our approach outperforms LLMs (e.g., GPT-3 and
PaLM) in accuracy across three distinct benchmarks while employing
significantly fewer parameters. Furthermore, we provide a comprehensive
analysis of the various components within our methodology to substantiate their
efficacy.
Related papers
- Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology [17.91379291654773]
This work explores using Large Language Models (LLMs) as teachers to generate effective hints for students simulated through LLMs.
The results show that model errors increase with higher temperature settings.
Interestingly, Llama-3-8B-Instruct as a teacher showed better overall performance than GPT-4o.
arXiv Detail & Related papers (2024-11-05T20:18:53Z) - Automated Feedback in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses [0.0]
This study aims to explore the potential of Large Language Models (LLMs) in facilitating automated feedback in math education.
We employ Mistral, a version of Llama catered to math, and fine-tune this model for evaluating student responses by leveraging a dataset of student responses and teacher-written feedback for middle-school math problems.
We evaluate the model's performance in scoring accuracy and the quality of feedback by utilizing judgments from 2 teachers.
arXiv Detail & Related papers (2024-10-29T16:57:45Z) - LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement [93.38736019287224]
"LLMs-as-Instructors" framework autonomously enhances the training of smaller target models.
Inspired by the theory of "Learning from Errors", this framework employs an instructor LLM to meticulously analyze the specific errors within a target model.
Within this framework, we implement two strategies: "Learning from Error," which focuses solely on incorrect responses to tailor training data, and "Learning from Error by Contrast", which uses contrastive learning to analyze both correct and incorrect responses for a deeper understanding of errors.
arXiv Detail & Related papers (2024-06-29T17:16:04Z) - Toward In-Context Teaching: Adapting Examples to Students' Misconceptions [54.82965010592045]
We introduce a suite of models and evaluation methods we call AdapT.
AToM is a new probabilistic model for adaptive teaching that jointly infers students' past beliefs and optimize for the correctness of future beliefs.
Our results highlight both the difficulty of the adaptive teaching task and the potential of learned adaptive models for solving it.
arXiv Detail & Related papers (2024-05-07T17:05:27Z) - Evaluating and Optimizing Educational Content with Large Language Model Judgments [52.33701672559594]
We use Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes.
We introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function.
Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences.
arXiv Detail & Related papers (2024-03-05T09:09:15Z) - Assessing the Impact of Prompting Methods on ChatGPT's Mathematical
Capabilities [5.362057681411727]
This study critically evaluates the efficacy of prompting methods in enhancing the mathematical reasoning capability of large language models (LLMs)
We conduct this analysis on OpenAI's LLM, ChatGPT-3.5, on extensive problem sets from the MATH, GSM8K, and MMLU datasets.
Contrary to expectations, our empirical analysis reveals that none of the investigated methods consistently improves over ChatGPT-3.5's baseline performance.
arXiv Detail & Related papers (2023-12-22T17:39:40Z) - Teaching Language Models to Self-Improve through Interactive Demonstrations [83.9421355808174]
Self-improving ability of large language models has been shown to be absent and difficult to learn for smaller models.
We introduce TriPosT, a training algorithm that endows smaller models with such self-improvement ability.
We show that our approach can improve a LLaMA-7b's performance on math and reasoning tasks by up to 7.13%.
arXiv Detail & Related papers (2023-10-20T14:11:04Z) - Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again [24.150464908060112]
We present the first systematic and comprehensive study to compare the few-shot performance of GPT-3 in-context learning with fine-tuning smaller (i.e., BERT-sized) PLMs.
Our results show that GPT-3 still significantly underperforms compared with simply fine-tuning a smaller PLM using the same small training set.
arXiv Detail & Related papers (2022-03-16T05:56:08Z) - Language Models are Few-Shot Learners [61.36677350504291]
We show that scaling up language models greatly improves task-agnostic, few-shot performance.
We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
arXiv Detail & Related papers (2020-05-28T17:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.