Related papers: Automated Analysis of Learning Outcomes and Exam Questions Based on Bloom's Taxonomy

Automated Analysis of Learning Outcomes and Exam Questions Based on Bloom's Taxonomy

URL: http://arxiv.org/abs/2511.10903v1
Date: Fri, 14 Nov 2025 02:31:12 GMT
Title: Automated Analysis of Learning Outcomes and Exam Questions Based on Bloom's Taxonomy
Authors: Ramya Kumar, Dhruv Gulwani, Sonit Singh,
Abstract summary: This paper explores the automatic classification of exam questions and learning outcomes according to Bloom's taxonomy.<n>A small dataset of 600 sentences labeled with six cognitive categories was processed using traditional machine learning (ML) models.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper explores the automatic classification of exam questions and learning outcomes according to Bloom's Taxonomy. A small dataset of 600 sentences labeled with six cognitive categories - Knowledge, Comprehension, Application, Analysis, Synthesis, and Evaluation - was processed using traditional machine learning (ML) models (Naive Bayes, Logistic Regression, Support Vector Machines), recurrent neural network architectures (LSTM, BiLSTM, GRU, BiGRU), transformer-based models (BERT and RoBERTa), and large language models (OpenAI, Gemini, Ollama, Anthropic). Each model was evaluated under different preprocessing and augmentation strategies (for example, synonym replacement, word embeddings, etc.). Among traditional ML approaches, Support Vector Machines (SVM) with data augmentation achieved the best overall performance, reaching 94 percent accuracy, recall, and F1 scores with minimal overfitting. In contrast, the RNN models and BERT suffered from severe overfitting, while RoBERTa initially overcame it but began to show signs as training progressed. Finally, zero-shot evaluations of large language models (LLMs) indicated that OpenAI and Gemini performed best among the tested LLMs, achieving approximately 0.72-0.73 accuracy and comparable F1 scores. These findings highlight the challenges of training complex deep models on limited data and underscore the value of careful data augmentation and simpler algorithms (such as augmented SVM) for Bloom's Taxonomy classification.

Related papers

AI Generated Text Detection [0.0]
This paper presents an evaluation of AI text detection methods, including both traditional machine learning models and transformer-based architectures.<n>We utilize two datasets, HC3 and DAIGT v2, to build a unified benchmark and apply a topic-based data split to prevent information leakage.<n>Results indicate that contextual modeling is significantly superior to lexical features and highlight the importance of mitigating topic memorization.
arXiv Detail & Related papers (2026-01-07T11:18:10Z)
Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic Data [33.7054351451505]
We introduce Agentomics-ML, a fully autonomous agent-based system designed to produce a classification model.<n>We show that Agentomics-ML outperforms existing state-of-the-art agent-based methods in both generalization and success rates.
arXiv Detail & Related papers (2025-06-05T19:44:38Z)
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning [77.120955854093]
We show that data diversity can be a strong predictor of generalization in language models.<n>We introduce G-Vendi, a metric that quantifies diversity via the entropy of model-induced gradients.<n>We present Prismatic Synthesis, a framework for generating diverse synthetic data.
arXiv Detail & Related papers (2025-05-26T16:05:10Z)
Analysis of Zero Day Attack Detection Using MLP and XAI [0.0]
This paper analyzes Machine Learning (ML) and Deep Learning (DL) based approaches to create Intrusion Detection Systems (IDS)<n>The focus is on using the KDD99 dataset, which has the most research done among all the datasets for detecting zero-day attacks.<n>We evaluate the performance of four multilayer perceptron (MLP) trained on the KDD99 dataset, including baseline ML models, weighted ML models, truncated ML models, and weighted truncated ML models.
arXiv Detail & Related papers (2025-01-28T02:20:34Z)
Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification [0.49110747024865004]
This research evaluates four different approaches for crop classification, namely traditional ML with handcrafted feature extraction methods like SIFT, ORB, and Color Histogram; Custom Designed CNN and established DL architecture like AlexNet; transfer learning on five models pre-trained using ImageNet. Xception outperformed all of them in terms of generalization, achieving 98% accuracy on the test data, with a model size of 80.03 MB and a prediction time of 0.0633 seconds.
arXiv Detail & Related papers (2024-08-22T14:20:34Z)
Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [71.04084063541777]
Counterfactual learning to rank has attracted extensive attention in the IR community.<n>Models can be theoretically unbiased when the user behavior assumption is correct and the propensity estimation is accurate.<n>Their effectiveness is usually empirically evaluated via simulation-based experiments due to a lack of widely available, large-scale, real click logs.
arXiv Detail & Related papers (2024-04-04T10:54:38Z)
Scalable Learning of Item Response Theory Models [48.91265296134559]
Item Response Theory (IRT) models aim to assess latent abilities of $n$ examinees along with latent difficulty characteristics of $m$ test items from categorical data. We leverage the similarity of these models to logistic regression, which can be approximated accurately using small weighted subsets called coresets.
arXiv Detail & Related papers (2024-03-01T17:12:53Z)
In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL) We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z)
An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum Disorder [0.0]
Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities. This study employs diverse machine learning methods to identify crucial ASD traits, aiming to enhance and automate the diagnostic process.
arXiv Detail & Related papers (2023-09-20T21:23:37Z)
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z)
Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints. This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks. Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z)
Exploring the Value of Pre-trained Language Models for Clinical Named Entity Recognition [6.917786124918387]
We compare Transformer models that are trained from scratch to fine-tuned BERT-based LLMs. We examine the impact of an additional CRF layer on such models to encourage contextual learning.
arXiv Detail & Related papers (2022-10-23T16:27:31Z)
Studying Drowsiness Detection Performance while Driving through Scalable Machine Learning Models using Electroencephalography [0.0]
Driver drowsiness is one of the leading causes of traffic accidents. Brain-Computer Interfaces (BCIs) and Machine Learning (ML) have enabled the detection of drivers' drowsiness. This work presents an intelligent framework employing BCIs and features based on electroencephalography for detecting drowsiness in driving scenarios.
arXiv Detail & Related papers (2022-09-08T22:14:33Z)
No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data. We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model. Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task [80.38130122127882]
TACRED is one of the largest, most widely used crowdsourced datasets in Relation Extraction (RE) In this paper, we investigate the questions: Have we reached a performance ceiling or is there still room for improvement? We find that label errors account for 8% absolute F1 test error, and that more than 50% of the examples need to be relabeled.
arXiv Detail & Related papers (2020-04-30T15:07:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.