Implicit assessment of language learning during practice as accurate as explicit testing
- URL: http://arxiv.org/abs/2409.16133v1
- Date: Tue, 24 Sep 2024 14:40:44 GMT
- Title: Implicit assessment of language learning during practice as accurate as explicit testing
- Authors: Jue Hou, Anisia Katinskaia, Anh-Duc Vu, Roman Yangarber,
- Abstract summary: We use Item Response Theory (IRT) in computer-aided language learning for assessment of student ability in two contexts.
We first aim to replace exhaustive tests with efficient but accurate adaptive tests.
Second, we explore whether we can accurately estimate learner ability directly from the context of practice with exercises, without testing.
- Score: 0.5749787074942512
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Assessment of proficiency of the learner is an essential part of Intelligent Tutoring Systems (ITS). We use Item Response Theory (IRT) in computer-aided language learning for assessment of student ability in two contexts: in test sessions, and in exercises during practice sessions. Exhaustive testing across a wide range of skills can provide a detailed picture of proficiency, but may be undesirable for a number of reasons. Therefore, we first aim to replace exhaustive tests with efficient but accurate adaptive tests. We use learner data collected from exhaustive tests under imperfect conditions, to train an IRT model to guide adaptive tests. Simulations and experiments with real learner data confirm that this approach is efficient and accurate. Second, we explore whether we can accurately estimate learner ability directly from the context of practice with exercises, without testing. We transform learner data collected from exercise sessions into a form that can be used for IRT modeling. This is done by linking the exercises to {\em linguistic constructs}; the constructs are then treated as "items" within IRT. We present results from large-scale studies with thousands of learners. Using teacher assessments of student ability as "ground truth," we compare the estimates obtained from tests vs. those from exercises. The experiments confirm that the IRT models can produce accurate ability estimation based on exercises.
Related papers
- BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping [64.8477128397529]
We propose a training-required and training-free test-time adaptation framework.
We maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples.
We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets.
arXiv Detail & Related papers (2024-10-20T15:58:43Z) - Training on the Test Task Confounds Evaluation and Emergence [16.32378359459614]
We show that training on the test task confounds both relative model evaluations and claims about emergent capabilities.
We propose an effective method to adjust for training on the test task by fine-tuning each model under comparison on the same task-relevant data before evaluation.
arXiv Detail & Related papers (2024-07-10T17:57:58Z) - Natural Language Processing Through Transfer Learning: A Case Study on
Sentiment Analysis [1.14219428942199]
This paper explores the potential of transfer learning in natural language processing focusing mainly on sentiment analysis.
The claim is that, compared to training models from scratch, transfer learning, using pre-trained BERT models, can increase sentiment classification accuracy.
arXiv Detail & Related papers (2023-11-28T17:12:06Z) - Making Pre-trained Language Models both Task-solvers and
Self-calibrators [52.98858650625623]
Pre-trained language models (PLMs) serve as backbones for various real-world systems.
Previous work shows that introducing an extra calibration task can mitigate this issue.
We propose a training algorithm LM-TOAST to tackle the challenges.
arXiv Detail & Related papers (2023-07-21T02:51:41Z) - Amortised Design Optimization for Item Response Theory [5.076871870091048]
In education, Item Response Theory (IRT) is used to infer student abilities and characteristics of test items from student responses.
In response, we propose incorporating amortised experimental design into IRT.
The computational cost is shifted to a precomputing phase by training a Deep Reinforcement Learning (DRL) agent with synthetic data.
arXiv Detail & Related papers (2023-07-19T10:42:56Z) - Learning to be a Statistician: Learned Estimator for Number of Distinct
Values [54.629042119819744]
Estimating the number of distinct values (NDV) in a column is useful for many tasks in database systems.
In this work, we focus on how to derive accurate NDV estimations from random (online/offline) samples.
We propose to formulate the NDV estimation task in a supervised learning framework, and aim to learn a model as the estimator.
arXiv Detail & Related papers (2022-02-06T15:42:04Z) - Sequence-level self-learning with multiple hypotheses [53.04725240411895]
We develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR)
In contrast to conventional unsupervised learning approaches, we adopt the emphmulti-task learning (MTL) framework.
Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only.
arXiv Detail & Related papers (2021-12-10T20:47:58Z) - BERT Fine-Tuning for Sentiment Analysis on Indonesian Mobile Apps
Reviews [1.5749416770494706]
This study examines the effectiveness of fine-tuning BERT for sentiment analysis using two different pre-trained models.
The dataset used is Indonesian user reviews of the ten best apps in 2020 in Google Play sites.
Two training data labeling approaches were also tested to determine the effectiveness of the model, which is score-based and lexicon-based.
arXiv Detail & Related papers (2021-07-14T16:00:15Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Learning by Passing Tests, with Application to Neural Architecture
Search [19.33620150924791]
We propose a novel learning approach called learning by passing tests.
A tester model creates increasingly more-difficult tests to evaluate a learner model.
The learner tries to continuously improve its learning ability so that it can successfully pass however difficult tests created by the tester.
arXiv Detail & Related papers (2020-11-30T18:33:34Z) - Fine-Tuning Pretrained Language Models: Weight Initializations, Data
Orders, and Early Stopping [62.78338049381917]
Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing.
We experiment with four datasets from the GLUE benchmark, fine-tuning BERT hundreds of times on each while varying only the random seeds.
We find substantial performance increases compared to previously reported results, and we quantify how the performance of the best-found model varies as a function of the number of fine-tuning trials.
arXiv Detail & Related papers (2020-02-15T02:40:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.