Related papers: Analyzing Feedback Mechanisms in AI-Generated MCQs: Insights into Readability, Lexical Properties, and Levels of Challenge

Analyzing Feedback Mechanisms in AI-Generated MCQs: Insights into Readability, Lexical Properties, and Levels of Challenge

URL: http://arxiv.org/abs/2504.21013v1
Date: Sat, 19 Apr 2025 09:20:52 GMT
Title: Analyzing Feedback Mechanisms in AI-Generated MCQs: Insights into Readability, Lexical Properties, and Levels of Challenge
Authors: Antoun Yaacoub, Zainab Assaghir, Lionel Prevost, Jérôme Da-Rugna,
Abstract summary: This study delves into the linguistic and structural attributes of feedback generated by Google's Gemini 1.5-flash text model for computer science multiple-choice questions (MCQs)<n>Key linguistic metrics, such as length, readability scores (Flesch-Kincaid Grade Level), vocabulary richness, and lexical density, were computed and examined.<n>The findings reveal significant interaction effects between feedback tone and question difficulty, demonstrating the dynamic adaptation of AI-generated feedback within diverse educational contexts.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Artificial Intelligence (AI)-generated feedback in educational settings has garnered considerable attention due to its potential to enhance learning outcomes. However, a comprehensive understanding of the linguistic characteristics of AI-generated feedback, including readability, lexical richness, and adaptability across varying challenge levels, remains limited. This study delves into the linguistic and structural attributes of feedback generated by Google's Gemini 1.5-flash text model for computer science multiple-choice questions (MCQs). A dataset of over 1,200 MCQs was analyzed, considering three difficulty levels (easy, medium, hard) and three feedback tones (supportive, neutral, challenging). Key linguistic metrics, such as length, readability scores (Flesch-Kincaid Grade Level), vocabulary richness, and lexical density, were computed and examined. A fine-tuned RoBERTa-based multi-task learning (MTL) model was trained to predict these linguistic properties, achieving a Mean Absolute Error (MAE) of 2.0 for readability and 0.03 for vocabulary richness. The findings reveal significant interaction effects between feedback tone and question difficulty, demonstrating the dynamic adaptation of AI-generated feedback within diverse educational contexts. These insights contribute to the development of more personalized and effective AI-driven feedback mechanisms, highlighting the potential for improved learning outcomes while underscoring the importance of ethical considerations in their design and deployment.

Related papers

What Makes a Good Natural Language Prompt? [72.3282960118995]
We conduct a meta-analysis surveying more than 150 prompting-related papers from leading NLP and AI conferences from 2022 to 2025.<n>We propose a property- and human-centric framework for evaluating prompt quality, encompassing 21 properties categorized into six dimensions.<n>We then empirically explore multi-property prompt enhancements in reasoning tasks, observing that single-property enhancements often have the greatest impact.
arXiv Detail & Related papers (2025-06-07T23:19:27Z)
Exploring the Potential of Large Language Models for Estimating the Reading Comprehension Question Difficulty [2.335292678914151]
This study investigates the effectiveness of Large Language Models (LLMs) in estimating the difficulty of reading comprehension questions.<n>We use OpenAI's GPT-4o and o1, in estimating the difficulty of reading comprehension questions using the Study Aid and Reading Assessment (SARA) dataset.<n>The results indicate that while the models yield difficulty estimates that align meaningfully with derived IRT parameters, there are notable differences in their sensitivity to extreme item characteristics.
arXiv Detail & Related papers (2025-02-25T02:28:48Z)
Devising a Set of Compact and Explainable Spoken Language Feature for Screening Alzheimer's Disease [52.46922921214341]
Alzheimer's disease (AD) has become one of the most significant health challenges in an aging society.<n>We devised an explainable and effective feature set that leverages the visual capabilities of a large language model (LLM) and the Term Frequency-Inverse Document Frequency (TF-IDF) model.<n>Our new features can be well explained and interpreted step by step which enhance the interpretability of automatic AD screening.
arXiv Detail & Related papers (2024-11-28T05:23:22Z)
The Future of Learning in the Age of Generative AI: Automated Question Generation and Assessment with Large Language Models [0.0]
Large language models (LLMs) and generative AI have revolutionized natural language processing (NLP) This chapter explores the transformative potential of LLMs in automated question generation and answer assessment.
arXiv Detail & Related papers (2024-10-12T15:54:53Z)
Evaluation of OpenAI o1: Opportunities and Challenges of AGI [112.0812059747033]
o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performance. The model excelled in tasks requiring intricate reasoning and knowledge integration across various fields. Overall results indicate significant progress towards artificial general intelligence.
arXiv Detail & Related papers (2024-09-27T06:57:00Z)
The Honorific Effect: Exploring the Impact of Japanese Linguistic Formalities on AI-Generated Physics Explanations [0.0]
This study investigates the influence of Japanese honorifics on the responses of large language models (LLMs) when explaining the law of conservation of momentum. We analyzed the outputs of six state-of-the-art AI models, including variations of ChatGPT, Coral, and Gemini, using 14 different honorific forms.
arXiv Detail & Related papers (2024-07-12T11:31:00Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs) Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z)
Explainable, Domain-Adaptive, and Federated Artificial Intelligence in Medicine [5.126042819606137]
We focus on three key methodological approaches that address some of the particular challenges in AI-driven medical decision making. Domain adaptation and transfer learning enable AI models to be trained and applied across multiple domains. Federated learning enables learning large-scale models without exposing sensitive personal health information.
arXiv Detail & Related papers (2022-11-17T03:32:00Z)
Incorporating Dynamic Semantics into Pre-Trained Language Model for Aspect-based Sentiment Analysis [67.41078214475341]
We propose Dynamic Re-weighting BERT (DR-BERT) to learn dynamic aspect-oriented semantics for ABSA. Specifically, we first take the Stack-BERT layers as a primary encoder to grasp the overall semantic of the sentence. We then fine-tune it by incorporating a lightweight Dynamic Re-weighting Adapter (DRA)
arXiv Detail & Related papers (2022-03-30T14:48:46Z)
Robustness Testing of Language Understanding in Dialog Systems [33.30143655553583]
We conduct comprehensive evaluation and analysis with respect to the robustness of natural language understanding models. We introduce three important aspects related to language understanding in real-world dialog systems, namely, language variety, speech characteristics, and noise perturbation. We propose a model-agnostic toolkit LAUG to approximate natural perturbation for testing the robustness issues in dialog systems.
arXiv Detail & Related papers (2020-12-30T18:18:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.