Modelling Assessment Rubrics through Bayesian Networks: a Pragmatic Approach
- URL: http://arxiv.org/abs/2209.05467v3
- Date: Fri, 2 Aug 2024 12:27:17 GMT
- Title: Modelling Assessment Rubrics through Bayesian Networks: a Pragmatic Approach
- Authors: Francesca Mangili, Giorgia Adorni, Alberto Piatti, Claudio Bonesana, Alessandro Antonucci,
- Abstract summary: This paper presents an approach to deriving a learner model directly from an assessment rubric.
We illustrate how the approach can be applied to automatize the human assessment of an activity developed for testing computational thinking skills.
- Score: 40.06500618820166
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic assessment of learner competencies is a fundamental task in intelligent tutoring systems. An assessment rubric typically and effectively describes relevant competencies and competence levels. This paper presents an approach to deriving a learner model directly from an assessment rubric defining some (partial) ordering of competence levels. The model is based on Bayesian networks and exploits logical gates with uncertainty (often referred to as noisy gates) to reduce the number of parameters of the model, so to simplify their elicitation by experts and allow real-time inference in intelligent tutoring systems. We illustrate how the approach can be applied to automatize the human assessment of an activity developed for testing computational thinking skills. The simple elicitation of the model starting from the assessment rubric opens up the possibility of quickly automating the assessment of several tasks, making them more easily exploitable in the context of adaptive assessment tools and intelligent tutoring systems.
Related papers
- AERA Chat: An Interactive Platform for Automated Explainable Student Answer Assessment [12.970776782360366]
AERA Chat is an interactive platform to provide visually explained assessment of student answers.
Users can input questions and student answers to obtain automated, explainable assessment results from large language models.
arXiv Detail & Related papers (2024-10-12T11:57:53Z) - Rubric-based Learner Modelling via Noisy Gates Bayesian Networks for Computational Thinking Skills Assessment [40.06500618820166]
We develop a learner model for automatic skill assessment from a task-specific competence rubric.
We design a network with two layers of gates, one performing disjunctive operations by noisy-OR gates and the other conjunctive operations through logical ANDs.
The CT-cube skills assessment framework and the Cross Array Task (CAT) are used to exemplify it and demonstrate its feasibility.
arXiv Detail & Related papers (2024-08-02T12:21:05Z) - Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment.
To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Evaluating General-Purpose AI with Psychometrics [43.85432514910491]
We discuss the need for a comprehensive and accurate evaluation of general-purpose AI systems such as large language models.
Current evaluation methodology, mostly based on benchmarks of specific tasks, falls short of adequately assessing these versatile AI systems.
To tackle these challenges, we suggest transitioning from task-oriented evaluation to construct-oriented evaluation.
arXiv Detail & Related papers (2023-10-25T05:38:38Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Knowledge State Networks for Effective Skill Assessment in Atomic
Learning [0.0]
This paper introduces a new framework for fast and effective knowledge state assessments in the context of personalized, skill-based online learning.
We use knowledge state networks - specific neural networks trained on assessment data of previous learners - to predict the full knowledge state of other learners from only partial information about their skills.
arXiv Detail & Related papers (2021-05-17T11:05:59Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z) - Assessment Modeling: Fundamental Pre-training Tasks for Interactive
Educational Systems [3.269851859258154]
A common way of circumventing label-scarce problems is pre-training a model to learn representations of the contents of learning items.
We propose Assessment Modeling, a class of fundamental pre-training tasks for general interactive educational systems.
arXiv Detail & Related papers (2020-01-01T02:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.