Related papers: TestAgent: An Adaptive and Intelligent Expert for Human Assessment

TestAgent: An Adaptive and Intelligent Expert for Human Assessment

URL: http://arxiv.org/abs/2506.03032v1
Date: Tue, 03 Jun 2025 16:07:54 GMT
Title: TestAgent: An Adaptive and Intelligent Expert for Human Assessment
Authors: Junhao Yu, Yan Zhuang, YuXuan Sun, Weibo Gao, Qi Liu, Mingyue Cheng, Zhenya Huang, Enhong Chen,
Abstract summary: We propose TestAgent, a large language model (LLM)-powered agent designed to enhance adaptive testing through interactive engagement.<n>TestAgent supports personalized question selection, captures test-takers' responses and anomalies, and provides precise outcomes through dynamic, conversational interactions.
Score: 62.060118490577366
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurately assessing internal human states is key to understanding preferences, offering personalized services, and identifying challenges in real-world applications. Originating from psychometrics, adaptive testing has become the mainstream method for human measurement and has now been widely applied in education, healthcare, sports, and sociology. It customizes assessments by selecting the fewest test questions . However, current adaptive testing methods face several challenges. The mechanized nature of most algorithms leads to guessing behavior and difficulties with open-ended questions. Additionally, subjective assessments suffer from noisy response data and coarse-grained test outputs, further limiting their effectiveness. To move closer to an ideal adaptive testing process, we propose TestAgent, a large language model (LLM)-powered agent designed to enhance adaptive testing through interactive engagement. This is the first application of LLMs in adaptive testing. TestAgent supports personalized question selection, captures test-takers' responses and anomalies, and provides precise outcomes through dynamic, conversational interactions. Experiments on psychological, educational, and lifestyle assessments show our approach achieves more accurate results with 20% fewer questions than state-of-the-art baselines, and testers preferred it in speed, smoothness, and other dimensions.

Related papers

A Forced-Choice Neural Cognitive Diagnostic Model of Personality Testing [12.122796840818577]
This study presents a deep learning-based Forced-Choice Neural Cognitive Diagnostic Model (FCNCD)<n>To account for the unidimensionality of items in forced-choice tests, we create interpretable participant and item parameters.<n>The FCNCD's effectiveness is validated by experiments on real-world and simulated datasets.
arXiv Detail & Related papers (2025-07-20T15:39:36Z)
Unveiling Assumptions: Exploring the Decisions of AI Chatbots and Human Testers [2.5327705116230477]
Decision-making relies on a variety of information, including code, requirements specifications, and other software artifacts. To fill in the gaps left by unclear information, we often rely on assumptions, intuition, or previous experiences to make decisions.
arXiv Detail & Related papers (2024-06-17T08:55:56Z)
Survey of Computerized Adaptive Testing: A Machine Learning Perspective [66.26687542572974]
Computerized Adaptive Testing (CAT) provides an efficient and tailored method for assessing the proficiency of examinees. This paper aims to provide a machine learning-focused survey on CAT, presenting a fresh perspective on this adaptive testing method.
arXiv Detail & Related papers (2024-03-31T15:09:47Z)
InterEvo-TR: Interactive Evolutionary Test Generation With Readability Assessment [1.6874375111244329]
We propose incorporating interactive readability assessments made by a tester into EvoSuite. Our approach, InterEvo-TR, interacts with the tester at different moments during the search. Our results show that the strategy to select and present intermediate results is effective for the purpose of readability assessment.
arXiv Detail & Related papers (2024-01-13T13:14:29Z)
ALBA: Adaptive Language-based Assessments for Mental Health [7.141164121152202]
This work introduces the task of Adaptive Language-Based Assessment ALBA. It involves adaptively ordering questions while also scoring an individual's latent psychological trait using limited language responses to previous questions. We found ALIRT to be the most accurate and scalable, achieving the highest accuracy with fewer questions.
arXiv Detail & Related papers (2023-11-11T03:37:17Z)
Position: AI Evaluation Should Learn from How We Test Humans [65.36614996495983]
We argue that psychometrics, a theory originating in the 20th century for human assessment, could be a powerful solution to the challenges in today's AI evaluations.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts [117.72709110877939]
Test-time adaptation (TTA) has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions.<n>We categorize TTA into several distinct groups based on the form of test data, namely, test-time domain adaptation, test-time batch adaptation, and online test-time adaptation.
arXiv Detail & Related papers (2023-03-27T16:32:21Z)
Hybrid Intelligent Testing in Simulation-Based Verification [0.0]
Several millions of tests may be required to achieve coverage goals. Coverage-Directed Test Selection learns from coverage feedback to bias testing towards the most effective tests. Novelty-Driven Verification learns to identify and simulate stimuli that differ from previous stimuli.
arXiv Detail & Related papers (2022-05-19T13:22:08Z)
A New Score for Adaptive Tests in Bayesian and Credal Networks [64.80185026979883]
A test is adaptive when its sequence and number of questions is dynamically tuned on the basis of the estimated skills of the taker. We present an alternative family of scores, based on the mode of the posterior probabilities, and hence easier to explain.
arXiv Detail & Related papers (2021-05-25T20:35:42Z)
Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually. Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.