Automated Assessment of Students' Code Comprehension using LLMs
- URL: http://arxiv.org/abs/2401.05399v1
- Date: Tue, 19 Dec 2023 20:39:12 GMT
- Title: Automated Assessment of Students' Code Comprehension using LLMs
- Authors: Priti Oli, Rabin Banjade, Jeevan Chapagain, Vasile Rus
- Abstract summary: Large Language Models (LLMs) and encoder-based Semantic Textual Similarity (STS) models are assessed.
Our findings indicate that LLMs, when prompted in few-shot and chain-of-thought setting, perform comparable to fine-tuned encoder-based models in evaluating students' short answers in programming domain.
- Score: 0.3293989832773954
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Assessing student's answers and in particular natural language answers is a
crucial challenge in the field of education. Advances in machine learning,
including transformer-based models such as Large Language Models(LLMs), have
led to significant progress in various natural language tasks. Nevertheless,
amidst the growing trend of evaluating LLMs across diverse tasks, evaluating
LLMs in the realm of automated answer assesment has not received much
attention. To address this gap, we explore the potential of using LLMs for
automated assessment of student's short and open-ended answer. Particularly, we
use LLMs to compare students' explanations with expert explanations in the
context of line-by-line explanations of computer programs.
For comparison purposes, we assess both Large Language Models (LLMs) and
encoder-based Semantic Textual Similarity (STS) models in the context of
assessing the correctness of students' explanation of computer code. Our
findings indicate that LLMs, when prompted in few-shot and chain-of-thought
setting perform comparable to fine-tuned encoder-based models in evaluating
students' short answers in programming domain.
Related papers
- What do Large Language Models Need for Machine Translation Evaluation? [12.42394213466485]
Large language models (LLMs) can achieve results comparable to fine-tuned multilingual pre-trained language models.
This paper explores what translation information, such as the source, reference, translation errors and annotation guidelines, is needed for LLMs to evaluate machine translation quality.
arXiv Detail & Related papers (2024-10-04T09:50:45Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Evaluating Language Models for Generating and Judging Programming Feedback [4.743413681603463]
Large language models (LLMs) have transformed research and practice across a wide range of domains.
We evaluate the efficiency of open-source LLMs in generating high-quality feedback for programming assignments.
arXiv Detail & Related papers (2024-07-05T21:44:11Z) - RepEval: Effective Text Evaluation with LLM Representation [55.26340302485898]
RepEval is a metric that leverages the projection of Large Language Models (LLMs) representations for evaluation.
Our work underscores the richness of information regarding text quality embedded within LLM representations, offering insights for the development of new metrics.
arXiv Detail & Related papers (2024-04-30T13:50:55Z) - Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks.
The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human.
These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Zero-Shot Question Answering over Financial Documents using Large
Language Models [0.18749305679160366]
We introduce a large language model (LLM) based approach to answer complex questions requiring multi-hop numerical reasoning over financial reports.
We use novel zero-shot prompts that guide the LLM to encode the required reasoning into a Python program or a domain specific language.
arXiv Detail & Related papers (2023-11-19T16:23:34Z) - Can Large Language Models Understand Real-World Complex Instructions? [54.86632921036983]
Large language models (LLMs) can understand human instructions, but struggle with complex instructions.
Existing benchmarks are insufficient to assess LLMs' ability to understand complex instructions.
We propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically.
arXiv Detail & Related papers (2023-09-17T04:18:39Z) - Exploring the Integration of Large Language Models into Automatic Speech
Recognition Systems: An Empirical Study [0.0]
This paper explores the integration of Large Language Models (LLMs) into Automatic Speech Recognition (ASR) systems.
Our primary focus is to investigate the potential of using an LLM's in-context learning capabilities to enhance the performance of ASR systems.
arXiv Detail & Related papers (2023-07-13T02:31:55Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.