Feedback Indicators: The Alignment between Llama and a Teacher in Language Learning
- URL: http://arxiv.org/abs/2508.11364v1
- Date: Fri, 15 Aug 2025 09:59:22 GMT
- Title: Feedback Indicators: The Alignment between Llama and a Teacher in Language Learning
- Authors: Sylvio RĂ¼dian, Yassin Elsir, Marvin Kretschmer, Sabine Cayrou, Niels Pinkwart,
- Abstract summary: It is essential first to extract relevant indicators, as these serve as the foundation upon which the feedback is constructed.<n>This study examines the initial phase of extracting such indicators from students' submissions of a language learning course using the large language model Llama 3.1.<n>The findings demonstrate statistically significant strong correlations, even in cases involving unanticipated combinations of indicators and criteria.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automated feedback generation has the potential to enhance students' learning progress by providing timely and targeted feedback. Moreover, it can assist teachers in optimizing their time, allowing them to focus on more strategic and personalized aspects of teaching. To generate high-quality, information-rich formative feedback, it is essential first to extract relevant indicators, as these serve as the foundation upon which the feedback is constructed. Teachers often employ feedback criteria grids composed of various indicators that they evaluate systematically. This study examines the initial phase of extracting such indicators from students' submissions of a language learning course using the large language model Llama 3.1. Accordingly, the alignment between indicators generated by the LLM and human ratings across various feedback criteria is investigated. The findings demonstrate statistically significant strong correlations, even in cases involving unanticipated combinations of indicators and criteria. The methodology employed in this paper offers a promising foundation for extracting indicators from students' submissions using LLMs. Such indicators can potentially be utilized to auto-generate explainable and transparent formative feedback in future research.
Related papers
- Scaling Equitable Reflection Assessment in Education via Large Language Models and Role-Based Feedback Agents [2.825140278227664]
Formative feedback is one of the most effective drivers of student learning.<n>In large or low-resource courses, instructors often lack the time, staffing, and bandwidth required to review and respond to every student reflection.<n>This paper presents a theory-grounded system that uses five coordinated role-based LLM agents to score learner reflections.
arXiv Detail & Related papers (2025-11-14T09:46:21Z) - Measuring Teaching with LLMs [4.061135251278187]
This paper uses custom Large Language Models built on sentence-level embeddings.<n>We show that these specialized models can achieve human-level and even super-human performance with expert human ratings above 0.65.<n>We also find that aggregate model scores align with teacher value-added measures, indicating they are capturing features relevant to student learning.
arXiv Detail & Related papers (2025-10-27T03:42:04Z) - Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning [62.23671919314693]
Large language models (LLMs) have demonstrated significant improvements in contextual understanding.<n>However, their ability to attend to truly critical information during long-context reasoning and generation still falls behind the pace.<n>We introduce a two-stage framework called Learning to Focus (LeaF) to mitigate confounding factors.
arXiv Detail & Related papers (2025-06-09T15:16:39Z) - Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models [49.1574468325115]
We introduce Speech-IFeval, an evaluation framework designed to assess instruction-following capabilities.<n>Recent SLMs integrate speech perception with large language models (LLMs), often degrading textual capabilities due to speech-centric training.<n>Our findings show that most SLMs struggle with even basic instructions, performing far worse than text-based LLMs.
arXiv Detail & Related papers (2025-05-25T08:37:55Z) - Does the Prompt-based Large Language Model Recognize Students' Demographics and Introduce Bias in Essay Scoring? [3.7498611358320733]
Large Language Models (LLMs) are widely used in Automated Essay Scoring (AES)<n>This study explores the relationship between the model's predictive power of students' demographic attributes based on their written works and its predictive bias in the scoring task in the prompt-based paradigm.
arXiv Detail & Related papers (2025-04-30T05:36:28Z) - How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks.<n>We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z) - LLMs as Educational Analysts: Transforming Multimodal Data Traces into Actionable Reading Assessment Reports [6.523137821124204]
This study investigates the use of multimodal data sources to derive meaningful reading insights.<n>We employ unsupervised learning techniques to identify distinct reading behavior patterns.<n>A large language model (LLM) synthesizes the derived information into actionable reports for educators.
arXiv Detail & Related papers (2025-03-03T22:34:08Z) - Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models [8.020688053947547]
One of the key strengths of Large Language Models (LLMs) is their ability to interact with humans by generating appropriate responses to given instructions.<n>This ability, known as instruction-following capability, has established a foundation for the use of LLMs across various fields.<n>We have noted that LLMs can become easily distracted by instruction-formatted statements, which may lead to an oversight of their instruction comprehension skills.
arXiv Detail & Related papers (2024-12-27T04:37:39Z) - How Good is ChatGPT in Giving Adaptive Guidance Using Knowledge Graphs in E-Learning Environments? [0.8999666725996978]
This study introduces an approach that integrates dynamic knowledge graphs with large language models (LLMs) to offer nuanced student assistance.<n>Central to this method is the knowledge graph's role in assessing a student's comprehension of topic prerequisites.<n>Preliminary findings suggest students could benefit from this tiered support, achieving enhanced comprehension and improved task outcomes.
arXiv Detail & Related papers (2024-12-05T04:05:43Z) - Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales.
We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning [63.63840740526497]
We investigate how instruction tuning adjusts pre-trained models with a focus on intrinsic changes.
The impact of instruction tuning is then studied by comparing the explanations derived from the pre-trained and instruction-tuned models.
Our findings reveal three significant impacts of instruction tuning.
arXiv Detail & Related papers (2023-09-30T21:16:05Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.