Related papers: Can LLMs Estimate Cognitive Complexity of Reading Comprehension Items?

Can LLMs Estimate Cognitive Complexity of Reading Comprehension Items?

URL: http://arxiv.org/abs/2510.25064v1
Date: Wed, 29 Oct 2025 01:07:26 GMT
Title: Can LLMs Estimate Cognitive Complexity of Reading Comprehension Items?
Authors: Seonjeong Hwang, Hyounghun Kim, Gary Geunbae Lee,
Abstract summary: Estimating the cognitive complexity of reading comprehension (RC) items is crucial for assessing item difficulty before it is administered to learners.<n>In this study, we examine whether large language models (LLMs) can estimate the cognitive complexity of RC items.
Score: 19.75655994660427
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Estimating the cognitive complexity of reading comprehension (RC) items is crucial for assessing item difficulty before it is administered to learners. Unlike syntactic and semantic features, such as passage length or semantic similarity between options, cognitive features that arise during answer reasoning are not readily extractable using existing NLP tools and have traditionally relied on human annotation. In this study, we examine whether large language models (LLMs) can estimate the cognitive complexity of RC items by focusing on two dimensions-Evidence Scope and Transformation Level-that indicate the degree of cognitive burden involved in reasoning about the answer. Our experimental results demonstrate that LLMs can approximate the cognitive complexity of items, indicating their potential as tools for prior difficulty analysis. Further analysis reveals a gap between LLMs' reasoning ability and their metacognitive awareness: even when they produce correct answers, they sometimes fail to correctly identify the features underlying their own reasoning process.

Related papers

Toward Cognitive Supersensing in Multimodal Large Language Model [67.15559571626747]
We introduce Cognitive Supersensing, a training paradigm that endows MLLMs with human-like visual imagery capabilities.<n>In experiments, MLLMs trained with Cognitive Supersensing significantly outperform state-of-the-art baselines on CogSense-Bench.<n>We will open-source the CogSense-Bench and our model weights.
arXiv Detail & Related papers (2026-02-02T02:19:50Z)
UniCog: Uncovering Cognitive Abilities of LLMs through Latent Mind Space Analysis [69.50752734049985]
A growing body of research suggests that the cognitive processes of large language models (LLMs) differ fundamentally from those of humans.<n>We propose UniCog, a unified framework that analyzes LLM cognition via a latent mind space.
arXiv Detail & Related papers (2026-01-25T16:19:00Z)
Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning [54.12174882424842]
Large language models (LLMs) have achieved state-of-the-art performance in a variety of tasks, but remain largely opaque in terms of their internal mechanisms.<n>We propose a novel interpretability framework to systematically analyze the roles and behaviors of attention heads.
arXiv Detail & Related papers (2025-12-03T10:24:34Z)
How LLMs Comprehend Temporal Meaning in Narratives: A Case Study in Cognitive Evaluation of LLMs [13.822169295436177]
We investigate how large language models (LLMs) process the temporal meaning of linguistic aspect in narratives that were previously used in human studies.<n>Our findings show that LLMs over-rely on prototypicality, produce inconsistent aspectual judgments, and struggle with causal reasoning derived from aspect.<n>These results suggest that LLMs process aspect fundamentally differently from humans and lack robust narrative understanding.
arXiv Detail & Related papers (2025-07-18T18:28:35Z)
Exploring the Potential of Large Language Models for Estimating the Reading Comprehension Question Difficulty [2.335292678914151]
This study investigates the effectiveness of Large Language Models (LLMs) in estimating the difficulty of reading comprehension questions.<n>We use OpenAI's GPT-4o and o1, in estimating the difficulty of reading comprehension questions using the Study Aid and Reading Assessment (SARA) dataset.<n>The results indicate that while the models yield difficulty estimates that align meaningfully with derived IRT parameters, there are notable differences in their sensitivity to extreme item characteristics.
arXiv Detail & Related papers (2025-02-25T02:28:48Z)
Disentangling Memory and Reasoning Ability in Large Language Models [97.26827060106581]
We propose a new inference paradigm that decomposes the complex inference process into two distinct and clear actions.<n>Our experiment results show that this decomposition improves model performance and enhances the interpretability of the inference process.
arXiv Detail & Related papers (2024-11-20T17:55:38Z)
From Feature Importance to Natural Language Explanations Using LLMs with RAG [4.204990010424084]
We introduce traceable question-answering, leveraging an external knowledge repository to inform responses of Large Language Models (LLMs) This knowledge repository comprises contextual details regarding the model's output, containing high-level features, feature importance, and alternative probabilities. We integrate four key characteristics - social, causal, selective, and contrastive - drawn from social science research on human explanations into a single-shot prompt, guiding the response generation process.
arXiv Detail & Related papers (2024-07-30T17:27:20Z)
Do Large Language Models Mirror Cognitive Language Processing? [43.68923267228057]
Large Language Models (LLMs) have demonstrated remarkable abilities in text comprehension and logical reasoning.<n>Brain cognitive processing signals are typically utilized to study human language processing.
arXiv Detail & Related papers (2024-02-28T03:38:20Z)
Identifying Semantic Induction Heads to Understand In-Context Learning [103.00463655766066]
We investigate whether attention heads encode two types of relationships between tokens present in natural languages. We find that certain attention heads exhibit a pattern where, when attending to head tokens, they recall tail tokens and increase the output logits of those tail tokens.
arXiv Detail & Related papers (2024-02-20T14:43:39Z)
Uncertainty Quantification for In-Context Learning of Large Language Models [52.891205009620364]
In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs) We propose a novel formulation and corresponding estimation method to quantify both types of uncertainties. The proposed method offers an unsupervised way to understand the prediction of in-context learning in a plug-and-play fashion.
arXiv Detail & Related papers (2024-02-15T18:46:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.