Can Language Models Employ the Socratic Method? Experiments with Code
Debugging
- URL: http://arxiv.org/abs/2310.03210v1
- Date: Wed, 4 Oct 2023 23:32:33 GMT
- Title: Can Language Models Employ the Socratic Method? Experiments with Code
Debugging
- Authors: Erfan Al-Hossami, Razvan Bunescu, Justin Smith, Ryan Teehan
- Abstract summary: This paper introduces a dataset of multi-turn Socratic advice that is aimed at helping a novice programmer fix buggy solutions to simple computational problems.
The dataset is then used for benchmarking the Socratic debug abilities of a number of language models, ranging from fine-tuning the instruction-based text-to-text transformer-T5 to zero-shot and chain of thought prompting of the much larger GPT-4.
- Score: 1.2776694801834354
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When employing the Socratic method of teaching, instructors guide students
toward solving a problem on their own rather than providing the solution
directly. While this strategy can substantially improve learning outcomes, it
is usually time-consuming and cognitively demanding. Automated Socratic
conversational agents can augment human instruction and provide the necessary
scale, however their development is hampered by the lack of suitable data for
training and evaluation. In this paper, we introduce a manually created dataset
of multi-turn Socratic advice that is aimed at helping a novice programmer fix
buggy solutions to simple computational problems. The dataset is then used for
benchmarking the Socratic debugging abilities of a number of language models,
ranging from fine-tuning the instruction-based text-to-text transformer Flan-T5
to zero-shot and chain of thought prompting of the much larger GPT-4. The code
and datasets are made freely available for research at the link below.
https://github.com/taisazero/socratic-debugging-benchmark
Related papers
- SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching [28.770954139539946]
This paper focuses on improving the capability of mathematics teaching via a Socratic teaching-based LLM (textttSocraticLLM)
We collect and release a high-quality mathematical teaching dataset, named textttSocraticMATH, which provides Socratic-style conversations of problems with extra knowledge.
Also, we propose a knowledge-enhanced LLM as a strong baseline to generate reliable responses with review, guidance/heuristic, rectification, and summarization.
arXiv Detail & Related papers (2024-07-24T15:18:17Z) - OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling [62.19438812624467]
Large language models (LLMs) have exhibited their problem-solving abilities in mathematical reasoning.
We propose OptiBench, a benchmark for End-to-end optimization problem-solving with human-readable inputs and outputs.
arXiv Detail & Related papers (2024-07-13T13:27:57Z) - A GPT-based Code Review System for Programming Language Learning [0.0]
This research proposes a system that employs GPT-4 to offer learner-friendly code reviews and minimize the risk of AI-assist cheating.
The improved system underwent evaluation by software education experts based on four criteria: strict code correctness checks, response time, lower API call costs, and the quality of code reviews.
arXiv Detail & Related papers (2024-06-21T12:16:01Z) - Instruct, Not Assist: LLM-based Multi-Turn Planning and Hierarchical Questioning for Socratic Code Debugging [27.70379206820154]
Socratic questioning is an effective teaching strategy, encouraging critical thinking and problem-solving.
TreeInstruct asks probing questions to help students independently identify and resolve errors.
It estimates a student's conceptual and syntactical knowledge to dynamically construct a question tree based on their responses and current knowledge state.
arXiv Detail & Related papers (2024-06-17T16:28:21Z) - A Knowledge-Component-Based Methodology for Evaluating AI Assistants [9.412070852474313]
We evaluate an automatic hint generator for CS1 programming assignments powered by GPT-4.
This system provides natural language guidance about how students can improve their incorrect solutions to short programming exercises.
arXiv Detail & Related papers (2024-06-09T00:58:39Z) - Improving Socratic Question Generation using Data Augmentation and Preference Optimization [2.1485350418225244]
Large language models (LLMs) can be used to augment human effort by automatically generating Socratic questions for students.
Existing methods that involve prompting these LLMs sometimes produce invalid outputs.
We propose a data augmentation method to enrich existing Socratic questioning datasets with questions that are invalid in specific ways.
Next, we propose a method to optimize open-source LLMs such as LLama 2 to prefer ground-truth questions over generated invalid ones.
arXiv Detail & Related papers (2024-03-01T00:08:20Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z) - Retrieve, Program, Repeat: Complex Knowledge Base Question Answering via
Alternate Meta-learning [56.771557756836906]
We present a novel method that automatically learns a retrieval model alternately with the programmer from weak supervision.
Our system leads to state-of-the-art performance on a large-scale task for complex question answering over knowledge bases.
arXiv Detail & Related papers (2020-10-29T18:28:16Z) - TuringAdvice: A Generative and Dynamic Evaluation of Language Use [90.3029315711237]
We propose TuringAdvice, a new challenge task and dataset for language understanding models.
Given a written situation that a real person is currently facing, a model must generate helpful advice in natural language.
Empirical results show that today's models struggle at TuringAdvice.
arXiv Detail & Related papers (2020-04-07T18:00:03Z) - The World is Not Binary: Learning to Rank with Grayscale Data for
Dialogue Response Selection [55.390442067381755]
We show that grayscale data can be automatically constructed without human effort.
Our method employs off-the-shelf response retrieval models and response generation models as automatic grayscale data generators.
Experiments on three benchmark datasets and four state-of-the-art matching models show that the proposed approach brings significant and consistent performance improvements.
arXiv Detail & Related papers (2020-04-06T06:34:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.