Instruct, Not Assist: LLM-based Multi-Turn Planning and Hierarchical Questioning for Socratic Code Debugging
- URL: http://arxiv.org/abs/2406.11709v4
- Date: Thu, 07 Nov 2024 07:00:14 GMT
- Title: Instruct, Not Assist: LLM-based Multi-Turn Planning and Hierarchical Questioning for Socratic Code Debugging
- Authors: Priyanka Kargupta, Ishika Agarwal, Dilek Hakkani-Tur, Jiawei Han,
- Abstract summary: Socratic questioning is an effective teaching strategy, encouraging critical thinking and problem-solving.
TreeInstruct asks probing questions to help students independently identify and resolve errors.
It estimates a student's conceptual and syntactical knowledge to dynamically construct a question tree based on their responses and current knowledge state.
- Score: 27.70379206820154
- License:
- Abstract: Socratic questioning is an effective teaching strategy, encouraging critical thinking and problem-solving. The conversational capabilities of large language models (LLMs) show great potential for providing scalable, real-time student guidance. However, current LLMs often give away solutions directly, making them ineffective instructors. We tackle this issue in the code debugging domain with TreeInstruct, an Instructor agent guided by a novel state space-based planning algorithm. TreeInstruct asks probing questions to help students independently identify and resolve errors. It estimates a student's conceptual and syntactical knowledge to dynamically construct a question tree based on their responses and current knowledge state, effectively addressing both independent and dependent mistakes concurrently in a multi-turn interaction setting. In addition to using an existing single-bug debugging benchmark, we construct a more challenging multi-bug dataset of 150 coding problems, incorrect solutions, and bug fixes -- all carefully constructed and annotated by experts. Extensive evaluation shows TreeInstruct's state-of-the-art performance on both datasets, proving it to be a more effective instructor than baselines. Furthermore, a real-world case study with five students of varying skill levels further demonstrates TreeInstruct's ability to guide students to debug their code efficiently with minimal turns and highly Socratic questioning.
Related papers
- Konstruktor: A Strong Baseline for Simple Knowledge Graph Question Answering [60.6042489577575]
We introduce Konstruktor - an efficient and robust approach that breaks down the problem into three steps.
Our approach integrates language models and knowledge graphs, exploiting the power of the former and the interpretability of the latter.
We show that for relation detection, the most challenging step of the workflow, a combination of relation classification/generation and ranking outperforms other methods.
arXiv Detail & Related papers (2024-09-24T09:19:11Z) - Effective Large Language Model Debugging with Best-first Tree Search [27.68711322875045]
Large Language Models (LLMs) show promise in code generation tasks.
LLMs cannot consistently spot and fix bugs.
We propose an algorithm to enable LLMs to debug their code via self-reflection and search where a model attempts to identify its previous mistakes.
arXiv Detail & Related papers (2024-07-26T19:26:00Z) - A Knowledge-Component-Based Methodology for Evaluating AI Assistants [9.412070852474313]
We evaluate an automatic hint generator for CS1 programming assignments powered by GPT-4.
This system provides natural language guidance about how students can improve their incorrect solutions to short programming exercises.
arXiv Detail & Related papers (2024-06-09T00:58:39Z) - KIWI: A Dataset of Knowledge-Intensive Writing Instructions for
Answering Research Questions [63.307317584926146]
Large language models (LLMs) adapted to follow user instructions are now widely deployed as conversational agents.
In this work, we examine one increasingly common instruction-following task: providing writing assistance to compose a long-form answer.
We construct KIWI, a dataset of knowledge-intensive writing instructions in the scientific domain.
arXiv Detail & Related papers (2024-03-06T17:16:44Z) - Can Language Models Employ the Socratic Method? Experiments with Code
Debugging [1.2776694801834354]
This paper introduces a dataset of multi-turn Socratic advice that is aimed at helping a novice programmer fix buggy solutions to simple computational problems.
The dataset is then used for benchmarking the Socratic debug abilities of a number of language models, ranging from fine-tuning the instruction-based text-to-text transformer-T5 to zero-shot and chain of thought prompting of the much larger GPT-4.
arXiv Detail & Related papers (2023-10-04T23:32:33Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - Knowledge-enhanced Iterative Instruction Generation and Reasoning for
Knowledge Base Question Answering [43.72266327778216]
Multi-hop Knowledge Base Question Answering aims to find the answer entity in a knowledge base which is several hops from the topic entity mentioned in the question.
Existing Retrieval-based approaches first generate instructions from the question and then use them to guide the multi-hop reasoning on the knowledge graph.
We do experiments on two multi-hop KBQA benchmarks and outperform the existing approaches, becoming the new-state-of-the-art.
arXiv Detail & Related papers (2022-09-07T09:02:45Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z) - PalmTree: Learning an Assembly Language Model for Instruction Embedding [8.74990895782223]
We propose to pre-train an assembly language model called PalmTree for generating general-purpose instruction embeddings.
PalmTree has the best performance for intrinsic metrics, and outperforms the other instruction embedding schemes for all downstream tasks.
arXiv Detail & Related papers (2021-01-21T22:30:01Z) - Learning by Fixing: Solving Math Word Problems with Weak Supervision [70.62896781438694]
Previous neural solvers of math word problems (MWPs) are learned with full supervision and fail to generate diverse solutions.
We introduce a textitweakly-supervised paradigm for learning MWPs.
Our method only requires the annotations of the final answers and can generate various solutions for a single problem.
arXiv Detail & Related papers (2020-12-19T03:10:21Z) - Retrieve, Program, Repeat: Complex Knowledge Base Question Answering via
Alternate Meta-learning [56.771557756836906]
We present a novel method that automatically learns a retrieval model alternately with the programmer from weak supervision.
Our system leads to state-of-the-art performance on a large-scale task for complex question answering over knowledge bases.
arXiv Detail & Related papers (2020-10-29T18:28:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.