Addressing Selection Bias in Computerized Adaptive Testing: A User-Wise
Aggregate Influence Function Approach
- URL: http://arxiv.org/abs/2308.11912v1
- Date: Wed, 23 Aug 2023 04:57:21 GMT
- Title: Addressing Selection Bias in Computerized Adaptive Testing: A User-Wise
Aggregate Influence Function Approach
- Authors: Soonwoo Kwon, Sojung Kim, Seunghyun Lee, Jin-Young Kim, Suyeong An,
and Kyuseok Kim
- Abstract summary: We propose a user-wise aggregate influence function method to tackle the selection bias issue.
Our intuition is to filter out users whose response data is heavily biased in an aggregate manner.
- Score: 14.175555669521987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computerized Adaptive Testing (CAT) is a widely used, efficient test mode
that adapts to the examinee's proficiency level in the test domain. CAT
requires pre-trained item profiles, for CAT iteratively assesses the student
real-time based on the registered items' profiles, and selects the next item to
administer using candidate items' profiles. However, obtaining such item
profiles is a costly process that involves gathering a large, dense
item-response data, then training a diagnostic model on the collected data. In
this paper, we explore the possibility of leveraging response data collected in
the CAT service. We first show that this poses a unique challenge due to the
inherent selection bias introduced by CAT, i.e., more proficient students will
receive harder questions. Indeed, when naively training the diagnostic model
using CAT response data, we observe that item profiles deviate significantly
from the ground-truth. To tackle the selection bias issue, we propose the
user-wise aggregate influence function method. Our intuition is to filter out
users whose response data is heavily biased in an aggregate manner, as judged
by how much perturbation the added data will introduce during parameter
estimation. This way, we may enhance the performance of CAT while introducing
minimal bias to the item profiles. We provide extensive experiments to
demonstrate the superiority of our proposed method based on the three public
datasets and one dataset that contains real-world CAT response data.
Related papers
- Fine-tuning can Help Detect Pretraining Data from Large Language Models [7.7209640786782385]
Current methods differentiate members and non-members by designing scoring functions, like Perplexity and Min-k%.
We introduce a novel and effective method termed Fine-tuned Score Deviation (FSD), which improves the performance of current scoring functions for pretraining data detection.
arXiv Detail & Related papers (2024-10-09T15:36:42Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals [91.59906995214209]
We propose a new evaluation method, Counterfactual Attentiveness Test (CAT)
CAT uses counterfactuals by replacing part of the input with its counterpart from a different example, expecting an attentive model to change its prediction.
We show that GPT3 becomes less attentive with an increased number of demonstrations, while its accuracy on the test data improves.
arXiv Detail & Related papers (2023-11-16T06:27:35Z) - On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z) - BOBCAT: Bilevel Optimization-Based Computerized Adaptive Testing [3.756550107432323]
Computerized adaptive testing (CAT) refers to a form of tests that are personalized to every student/test taker.
We propose BOBCAT, a Bilevel Optimization-Based framework for CAT to directly learn a data-driven question selection algorithm from training data.
arXiv Detail & Related papers (2021-08-17T00:40:23Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.