Related papers: Open, Small, Rigmarole -- Evaluating Llama 3.2 3B's Feedback for Programming Exercises

Open, Small, Rigmarole -- Evaluating Llama 3.2 3B's Feedback for Programming Exercises

URL: http://arxiv.org/abs/2504.01054v1
Date: Tue, 01 Apr 2025 17:24:39 GMT
Title: Open, Small, Rigmarole -- Evaluating Llama 3.2 3B's Feedback for Programming Exercises
Authors: Imen Azaiz, Natalie Kiesler, Sven Strickroth, Anni Zhang,
Abstract summary: Large Language Models (LLMs) have been subject to extensive research in the past few years.<n>This study explores the feedback characteristics of the open, lightweight LLM Llama 3.2 (3B)
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have been subject to extensive research in the past few years. This is particularly true for the potential of LLMs to generate formative programming feedback for novice learners at university. In contrast to Generative AI (GenAI) tools based on LLMs, such as GPT, smaller and open models have received much less attention. Yet, they offer several benefits, as educators can let them run on a virtual machine or personal computer. This can help circumvent some major concerns applicable to other GenAI tools and LLMs (e. g., data protection, lack of control over changes, privacy). Therefore, this study explores the feedback characteristics of the open, lightweight LLM Llama 3.2 (3B). In particular, we investigate the models' responses to authentic student solutions to introductory programming exercises written in Java. The generated output is qualitatively analyzed to help evaluate the feedback's quality, content, structure, and other features. The results provide a comprehensive overview of the feedback capabilities and serious shortcomings of this open, small LLM. We further discuss the findings in the context of previous research on LLMs and contribute to benchmarking recently available GenAI tools and their feedback for novice learners of programming. Thereby, this work has implications for educators, learners, and tool developers attempting to utilize all variants of LLMs (including open, and small models) to generate formative feedback and support learning.

Related papers

Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
Junior Software Developers' Perspectives on Adopting LLMs for Software Engineering: a Systematic Literature Review [17.22501688824729]
This paper provides an overview of junior software developers' perspectives and use of Large Language Model-based tools for software engineering (LLM4SE)<n>We conducted a systematic literature review following guidelines by Kitchenham et al. on 56 primary studies.<n>Only 8.9% of the studies provide a clear definition for junior software developers, and there is no uniformity.
arXiv Detail & Related papers (2025-03-10T17:25:24Z)
Prompt-Based Cost-Effective Evaluation and Operation of ChatGPT as a Computer Programming Teaching Assistant [0.0]
This article focuses on studying three aspects related to such an application.<n>The performance of two well-known models, GPT-3.5T and GPT-4T, in providing feedback to students is evaluated.
arXiv Detail & Related papers (2025-01-24T08:15:05Z)
Learning to Ask: When LLM Agents Meet Unclear Instruction [55.65312637965779]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone. We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench. We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z)
Evaluating Language Models for Generating and Judging Programming Feedback [4.743413681603463]
Large language models (LLMs) have transformed research and practice across a wide range of domains. We evaluate the efficiency of open-source LLMs in generating high-quality feedback for programming assignments.
arXiv Detail & Related papers (2024-07-05T21:44:11Z)
Tool Learning with Large Language Models: A Survey [60.733557487886635]
Tool learning with large language models (LLMs) has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems. Despite growing attention and rapid advancements in this field, the existing literature remains fragmented and lacks systematic organization.
arXiv Detail & Related papers (2024-05-28T08:01:26Z)
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error [54.954211216847135]
Existing large language models (LLMs) only reach a correctness rate in the range of 30% to 60%. We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE) STE orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory.
arXiv Detail & Related papers (2024-03-07T18:50:51Z)
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios [49.33633818046644]
We propose ToolEyes, a fine-grained system tailored for the evaluation of large language models' tool learning capabilities in authentic scenarios.<n>The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning.<n>ToolEyes incorporates a tool library boasting approximately 600 tools, serving as an intermediary between LLMs and the physical world.
arXiv Detail & Related papers (2024-01-01T12:49:36Z)
Next-Step Hint Generation for Introductory Programming Using Large Language Models [0.8002196839441036]
Large Language Models possess skills such as answering questions, writing essays or solving programming exercises. This work explores how LLMs can contribute to programming education by supporting students with automated next-step hints.
arXiv Detail & Related papers (2023-12-03T17:51:07Z)
Democratizing Reasoning Ability: Tailored Learning from Large Language Model [97.4921006089966]
We propose a tailored learning approach to distill such reasoning ability to smaller LMs. We exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm. To exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes.
arXiv Detail & Related papers (2023-10-20T07:50:10Z)
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback [78.60644407028022]
We introduce MINT, a benchmark that evaluates large language models' ability to solve tasks with multi-turn interactions. LLMs generally benefit from tools and language feedback, with performance gains of 1-8% for each turn of tool use. LLMs evaluated, supervised instruction-finetuning (SIFT) and reinforcement learning from human feedback (RLHF) generally hurt multi-turn capabilities.
arXiv Detail & Related papers (2023-09-19T15:25:42Z)
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction [41.36474802204914]
GPT4Tools is based on self-instruct to enable open-source LLMs, such as LLaMA and OPT, to use tools. It generates an instruction-following dataset by prompting an advanced teacher with various multi-modal contexts.
arXiv Detail & Related papers (2023-05-30T05:27:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.