Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation
- URL: http://arxiv.org/abs/2412.11831v1
- Date: Mon, 16 Dec 2024 14:55:09 GMT
- Title: Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation
- Authors: Leonidas Zotos, Hedderik van Rijn, Malvina Nissim,
- Abstract summary: We show that uncertainty features contribute substantially to difficulty prediction, where difficulty is inversely proportional to the number of students who can correctly answer a question.
In addition to showing the value of our approach, we also observe that our model achieves state-of-the-art results on the BEA publicly available dataset.
- Score: 12.638577140117702
- License:
- Abstract: In an educational setting, an estimate of the difficulty of multiple-choice questions (MCQs), a commonly used strategy to assess learning progress, constitutes very useful information for both teachers and students. Since human assessment is costly from multiple points of view, automatic approaches to MCQ item difficulty estimation are investigated, yielding however mixed success until now. Our approach to this problem takes a different angle from previous work: asking various Large Language Models to tackle the questions included in two different MCQ datasets, we leverage model uncertainty to estimate item difficulty. By using both model uncertainty features as well as textual features in a Random Forest regressor, we show that uncertainty features contribute substantially to difficulty prediction, where difficulty is inversely proportional to the number of students who can correctly answer a question. In addition to showing the value of our approach, we also observe that our model achieves state-of-the-art results on the BEA publicly available dataset.
Related papers
- MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations [90.07275414500154]
We observe significant performance drops on MATH-P-Hard across various models.
We also raise concerns about a novel form of memorization where models blindly apply learned problem-solving skills.
arXiv Detail & Related papers (2025-02-10T13:31:46Z) - DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction [53.803276766404494]
Existing methods, which gauge a model's uncertainty through evaluating self-consistency in responses to the original query, do not always capture true uncertainty.
We propose a novel method, DiverseAgentEntropy, for evaluating a model's uncertainty using multi-agent interaction.
Our method offers a more accurate prediction of the model's reliability and further detects hallucinations, outperforming other self-consistency-based methods.
arXiv Detail & Related papers (2024-12-12T18:52:40Z) - Testing Uncertainty of Large Language Models for Physics Knowledge and Reasoning [0.0]
Large Language Models (LLMs) have gained significant popularity in recent years for their ability to answer questions in various fields.
We introduce an analysis for evaluating the performance of popular open-source LLMs.
We focus on the relationship between answer accuracy and variability in topics related to physics.
arXiv Detail & Related papers (2024-11-18T13:42:13Z) - Machine Unlearning in Forgettability Sequence [22.497699136603877]
We identify key factor affecting unlearning difficulty and the performance of unlearning algorithms.
We propose a general unlearning framework, dubbed RSU, which consists of Ranking module and SeqUnlearn module.
arXiv Detail & Related papers (2024-10-09T01:12:07Z) - Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty? [12.638577140117702]
We leverage an aspect of generative large models which might be seen as a weakness when answering questions.
We explore correlations between two different metrics of uncertainty, and the actual student response distribution.
arXiv Detail & Related papers (2024-07-07T10:48:04Z) - Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning
for Video Question Answering [63.12469700986452]
We introduce the concept of uncertainty-aware curriculum learning (CL)
Here, uncertainty serves as the guiding principle for dynamically adjusting the difficulty.
In practice, we seamlessly integrate the VideoQA model into our framework and conduct comprehensive experiments.
arXiv Detail & Related papers (2024-01-03T02:29:34Z) - Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs.
Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems.
arXiv Detail & Related papers (2023-11-26T22:47:54Z) - Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z) - Exploring Bayesian Deep Learning for Urgent Instructor Intervention Need
in MOOC Forums [58.221459787471254]
Massive Open Online Courses (MOOCs) have become a popular choice for e-learning thanks to their great flexibility.
Due to large numbers of learners and their diverse backgrounds, it is taxing to offer real-time support.
With the large volume of posts and high workloads for MOOC instructors, it is unlikely that the instructors can identify all learners requiring intervention.
This paper explores for the first time Bayesian deep learning on learner-based text posts with two methods: Monte Carlo Dropout and Variational Inference.
arXiv Detail & Related papers (2021-04-26T15:12:13Z) - R2DE: a NLP approach to estimating IRT parameters of newly generated
questions [3.364554138758565]
R2DE is a model capable of assessing newly generated multiple-choice questions by looking at the text of the question.
In particular, it can estimate the difficulty and the discrimination of each question.
arXiv Detail & Related papers (2020-01-21T14:31:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.