Selective Mixup for Debiasing Question Selection in Computerized Adaptive Testing
- URL: http://arxiv.org/abs/2511.15241v2
- Date: Thu, 20 Nov 2025 11:20:32 GMT
- Title: Selective Mixup for Debiasing Question Selection in Computerized Adaptive Testing
- Authors: Mi Tian, Kun Zhang, Fei Liu, Jinglong Li, Yuxin Liao, Chenxi Bai, Zhengtao Tan, Le Wu, Richang Hong,
- Abstract summary: Computerized Adaptive Testing (CAT) is a widely used technology for evaluating learners' proficiency in online education platforms.<n> Selection Bias arises because the question selection is strongly influenced by the estimated proficiency.<n>We propose a debiasing framework consisting of two key modules: Cross-Attribute Examinee Retrieval and Selective Mixup-based Regularization.
- Score: 50.805231979748434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computerized Adaptive Testing (CAT) is a widely used technology for evaluating learners' proficiency in online education platforms. By leveraging prior estimates of proficiency to select questions and updating the estimates iteratively based on responses, CAT enables personalized learner modeling and has attracted substantial attention. Despite this progress, most existing works focus primarily on improving diagnostic accuracy, while overlooking the selection bias inherent in the adaptive process. Selection Bias arises because the question selection is strongly influenced by the estimated proficiency, such as assigning easier questions to learners with lower proficiency and harder ones to learners with higher proficiency. Since the selection depends on prior estimation, this bias propagates into the diagnosis model, which is further amplified during iterative updates, leading to misalignment and biased predictions. Moreover, the imbalanced nature of learners' historical interactions often exacerbates the bias in diagnosis models. To address this issue, we propose a debiasing framework consisting of two key modules: Cross-Attribute Examinee Retrieval and Selective Mixup-based Regularization. First, we retrieve balanced examinees with relatively even distributions of correct and incorrect responses and use them as neutral references for biased examinees. Then, mixup is applied between each biased examinee and its matched balanced counterpart under label consistency. This augmentation enriches the diversity of bias-conflicting samples and smooths selection boundaries. Finally, extensive experiments on two benchmark datasets with multiple advanced diagnosis models demonstrate that our method substantially improves both the generalization ability and fairness of question selection in CAT.
Related papers
- Adam Simplified: Bias Correction Debunked [17.2249234816671]
This paper investigates the role of bias-correction, a feature whose contribution remains poorly understood.<n>Through a series of systematic ablations on vision and language modelling tasks, we demonstrate that the conventional wisdom surrounding bias correction is misleading.
arXiv Detail & Related papers (2025-11-25T17:20:40Z) - To Bias or Not to Bias: Detecting bias in News with bias-detector [1.8024397171920885]
We perform sentence-level bias classification by fine-tuning a RoBERTa-based model on the expert-annotated BABE dataset.<n>We show statistically significant improvements in performance when comparing our model to a domain-adaptively pre-trained DA-RoBERTa baseline.<n>Our findings contribute to building more robust, explainable, and socially responsible NLP systems for media bias detection.
arXiv Detail & Related papers (2025-05-19T11:54:39Z) - DCAST: Diverse Class-Aware Self-Training Mitigates Selection Bias for Fairer Learning [0.0]
bias unascribed to sensitive features is challenging to identify and typically goes undiagnosed.
Strategies to mitigate unidentified bias and evaluate mitigation methods are crucially needed, yet remain underexplored.
We introduce Diverse Class-Aware Self-Training (DCAST), model-agnostic mitigation aware of class-specific bias.
arXiv Detail & Related papers (2024-09-30T09:26:19Z) - Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes [54.18828236350544]
Propensity score matching (PSM) addresses selection biases by selecting comparable populations for analysis.
Different matching methods can produce significantly different Average Treatment Effects (ATE) for the same task, even when meeting all validation criteria.
To address this issue, we introduce a novel metric, A2A, to reduce the number of valid matches.
arXiv Detail & Related papers (2024-07-20T12:42:24Z) - Medical Image Debiasing by Learning Adaptive Agreement from a Biased
Council [8.530912655468645]
Deep learning could be prone to learning shortcuts raised by dataset bias.
Despite its significance, there is a dearth of research in the medical image classification domain to address dataset bias.
This paper proposes learning Adaptive Agreement from a Biased Council (Ada-ABC), a debiasing framework that does not rely on explicit bias labels.
arXiv Detail & Related papers (2024-01-22T06:29:52Z) - Large Language Models Are Not Robust Multiple Choice Selectors [117.72712117510953]
Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs)
This work shows that modern LLMs are vulnerable to option position changes due to their inherent "selection bias"
We propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution.
arXiv Detail & Related papers (2023-09-07T17:44:56Z) - Improving Evaluation of Debiasing in Image Classification [29.711865666774017]
Our study indicates several issues need to be improved when conducting evaluation of debiasing in image classification.
Based on such issues, this paper proposes an evaluation metric Align-Conflict (AC) score' for the tuning criterion.
We believe our findings and lessons inspire future researchers in debiasing to further push state-of-the-art performances with fair comparisons.
arXiv Detail & Related papers (2022-06-08T05:24:13Z) - Pseudo Bias-Balanced Learning for Debiased Chest X-ray Classification [57.53567756716656]
We study the problem of developing debiased chest X-ray diagnosis models without knowing exactly the bias labels.
We propose a novel algorithm, pseudo bias-balanced learning, which first captures and predicts per-sample bias labels.
Our proposed method achieved consistent improvements over other state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-18T11:02:18Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system.
Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model.
We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.