Related papers: Multi-Method Analysis of Mathematics Placement Assessments: Classical, Machine Learning, and Clustering Approaches

Multi-Method Analysis of Mathematics Placement Assessments: Classical, Machine Learning, and Clustering Approaches

URL: http://arxiv.org/abs/2511.04667v1
Date: Thu, 06 Nov 2025 18:53:07 GMT
Title: Multi-Method Analysis of Mathematics Placement Assessments: Classical, Machine Learning, and Clustering Approaches
Authors: Julian D. Allagan, Dasia A. Singleton, Shanae N. Perry, Gabrielle C. Morgan, Essence A. Morgan,
Abstract summary: This study evaluates a 40-item mathematics placement examination administered to 198 students.<n>It uses a multi-method framework combining Classical Test Theory, machine learning, and unsupervised clustering.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study evaluates a 40-item mathematics placement examination administered to 198 students using a multi-method framework combining Classical Test Theory, machine learning, and unsupervised clustering. Classical Test Theory analysis reveals that 55\% of items achieve excellent discrimination ($D \geq 0.40$) while 30\% demonstrate poor discrimination ($D < 0.20$) requiring replacement. Question 6 (Graph Interpretation) emerges as the examination's most powerful discriminator, achieving perfect discrimination ($D = 1.000$), highest ANOVA F-statistic ($F = 4609.1$), and maximum Random Forest feature importance (0.206), accounting for 20.6\% of predictive power. Machine learning algorithms demonstrate exceptional performance, with Random Forest and Gradient Boosting achieving 97.5\% and 96.0\% cross-validation accuracy. K-means clustering identifies a natural binary competency structure with a boundary at 42.5\%, diverging from the institutional threshold of 55\% and suggesting potential overclassification into remedial categories. The two-cluster solution exhibits exceptional stability (bootstrap ARI = 0.855) with perfect lower-cluster purity. Convergent evidence across methods supports specific refinements: replace poorly discriminating items, implement a two-stage assessment, and integrate Random Forest predictions with transparency mechanisms. These findings demonstrate that multi-method integration provides a robust empirical foundation for evidence-based mathematics placement optimization.

Related papers

Almost Asymptotically Optimal Active Clustering Through Pairwise Observations [59.20614082241528]
We propose a new analysis framework for clustering $M$ items into an unknown number of $K$ distinct groups using noisy and actively collected responses.<n>We establish a fundamental lower bound on the expected number of queries needed to achieve a desired confidence in the accuracy of the clustering.<n>We develop a computationally feasible variant of the Generalized Likelihood Ratio statistic and show that its performance gap to the lower bound can be accurately empirically estimated.
arXiv Detail & Related papers (2026-02-05T14:16:47Z)
EvalQReason: A Framework for Step-Level Reasoning Evaluation in Large Language Models [0.8399688944263844]
We present EvalQReason, a framework that quantifies LLM reasoning quality through step-level probability distribution analysis.<n>The framework introduces two complementary algorithms: Consecutive Step Divergence (CSD), which measures local coherence between adjacent reasoning steps, and Step-to-Final Convergence (SFC), which assesses global alignment with final answers.
arXiv Detail & Related papers (2026-02-02T16:32:40Z)
Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging [18.71249153088185]
Machine learning models in high-stakes applications, such as recidivism prediction and automated personnel selection, often exhibit systematic performance disparities.<n>We propose a novel empirical likelihood-based (EL) framework that constructs robust statistical measures for model performance disparities.
arXiv Detail & Related papers (2026-01-28T05:36:19Z)
Feature Selection and Regularization in Multi-Class Classification: An Empirical Study of One-vs-Rest Logistic Regression with Gradient Descent Optimization and L1 Sparsity Constraints [0.0]
Multi-class wine classification presents fundamental trade-offs between model accuracy, feature dimensionality, and interpretability.<n>This paper presents a comprehensive empirical study of One-vs-Rest logistic regression on the UCI Wine dataset.
arXiv Detail & Related papers (2025-10-16T08:47:05Z)
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data? [82.09573568241724]
EssenceBench is a coarse-to-fine framework utilizing an iterative Genetic Algorithm (GA)<n>Our approach yields superior compression results with lower reconstruction error and markedly higher efficiency.<n>On the HellaSwag benchmark (10K samples), our method preserves the ranking of all models shifting within 5% using 25x fewer samples, and achieves 95% ranking preservation shifting within 5% using only 200x fewer samples.
arXiv Detail & Related papers (2025-10-12T05:38:10Z)
Semiparametric Learning from Open-Set Label Shift Data [14.537408547515627]
We study the open-set label shift problem, where the test data may include a novel class absent from training.<n>This setting is challenging because both the class proportions and the distribution of the novel class are not identifiable without extra assumptions.<n>We propose a semiparametric density ratio model framework that ensures identifiability while allowing overlap between novel and known classes.
arXiv Detail & Related papers (2025-09-18T01:32:29Z)
How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance [64.1656365676171]
Group imbalance has been a known problem in empirical risk minimization. This paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance.
arXiv Detail & Related papers (2024-03-12T04:38:05Z)
Beyond Expectations: Learning with Stochastic Dominance Made Practical [88.06211893690964]
dominance models risk-averse preferences for decision making with uncertain outcomes. Despite theoretically appealing, the application of dominance in machine learning has been scarce. We first generalize the dominance concept to enable feasible comparisons between any arbitrary pair of random variables. We then develop a simple and efficient approach for finding the optimal solution in terms of dominance.
arXiv Detail & Related papers (2024-02-05T03:21:23Z)
Analysis of Diagnostics (Part I): Prevalence, Uncertainty Quantification, and Machine Learning [0.0]
This manuscript is the first in a two-part series that studies deeper connections between classification theory and prevalence. We propose a numerical, homotopy algorithm that estimates the $Bstar (q)$ by minimizing a prevalence-weighted empirical error. We validate our methods in the context of synthetic data and a research-use-only SARS-CoV-2 enzyme-linked immunosorbent (ELISA) assay.
arXiv Detail & Related papers (2023-08-30T13:26:49Z)
ProBoost: a Boosting Method for Probabilistic Classifiers [55.970609838687864]
ProBoost is a new boosting algorithm for probabilistic classifiers. It uses the uncertainty of each training sample to determine the most challenging/uncertain ones. It produces a sequence that progressively focuses on the samples found to have the highest uncertainty.
arXiv Detail & Related papers (2022-09-04T12:49:20Z)
Minimax Pareto Fairness: A Multi Objective Perspective [24.600419295290504]
Group fairness is a multi-objective optimization problem, where each sensitive group risk is a separate objective. We provide a simple algorithm compatible with deep neural networks to satisfy these constraints. We test the proposed methodology on real case-studies of predicting income, ICU patient mortality, skin lesions classification, and assessing credit risk.
arXiv Detail & Related papers (2020-11-03T16:21:53Z)
Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management. We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.