Generative Data Imputation for Sparse Learner Performance Data Using Generative Adversarial Imputation Networks
- URL: http://arxiv.org/abs/2503.18982v2
- Date: Sun, 13 Apr 2025 21:04:27 GMT
- Title: Generative Data Imputation for Sparse Learner Performance Data Using Generative Adversarial Imputation Networks
- Authors: Liang Zhang, Jionghao Lin, John Sabatini, Diego Zapata-Rivera, Carol Forsyth, Yang Jiang, John Hollander, Xiangen Hu, Arthur C. Graesser,
- Abstract summary: Missing responses due to skips or incomplete attempts create data sparsity.<n>We propose a generative imputation approach using Generative Adrial Imputation Networks (GAIN)<n>Our method features a three-dimensional (3D) framework (learners, questions, and attempts), flexibly accommodating various sparsity levels.
- Score: 3.0800525961862992
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learner performance data collected by Intelligent Tutoring Systems (ITSs), such as responses to questions, is essential for modeling and predicting learners' knowledge states. However, missing responses due to skips or incomplete attempts create data sparsity, challenging accurate assessment and personalized instruction. To address this, we propose a generative imputation approach using Generative Adversarial Imputation Networks (GAIN). Our method features a three-dimensional (3D) framework (learners, questions, and attempts), flexibly accommodating various sparsity levels. Enhanced by convolutional neural networks and optimized with a least squares loss function, the GAIN-based method aligns input and output dimensions to question-attempt matrices along the learners' dimension. Extensive experiments using datasets from AutoTutor Adult Reading Comprehension (ARC), ASSISTments, and MATHia demonstrate that our approach significantly outperforms tensor factorization and alternative GAN methods in imputation accuracy across different attempt scenarios. Bayesian Knowledge Tracing (BKT) further validates the effectiveness of the imputed data by estimating learning parameters: initial knowledge (P(L0)), learning rate (P(T)), guess rate (P(G)), and slip rate (P(S)). Results indicate the imputed data enhances model fit and closely mirrors original distributions, capturing underlying learning behaviors reliably. Kullback-Leibler (KL) divergence assessments confirm minimal divergence, showing the imputed data preserves essential learning characteristics effectively. These findings underscore GAIN's capability as a robust imputation tool in ITSs, alleviating data sparsity and supporting adaptive, individualized instruction, ultimately leading to more precise and responsive learner assessments and improved educational outcomes.
Related papers
- Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI [17.242331892899543]
Learning performance data describe correct and incorrect answers or problem-solving attempts in adaptive learning.<n>Learning performance data tend to be highly sparse (80%(sim)90% missing observations) in most real-world applications due to adaptive item selection.<n>This article proposes a systematic framework for augmenting learner data to address data sparsity in learning performance data.
arXiv Detail & Related papers (2024-09-24T00:25:07Z) - Generative Adversarial Networks for Imputing Sparse Learning Performance [3.0350058108125646]
This paper proposes using the Generative Adversarial Imputation Networks (GAIN) framework to impute sparse learning performance data.
Our customized GAIN-based method computational process imputes sparse data in a 3D tensor space.
This finding enhances comprehensive learning data modeling and analytics in AI-based education.
arXiv Detail & Related papers (2024-07-26T17:09:48Z) - Improvement of Applicability in Student Performance Prediction Based on Transfer Learning [2.3290007848431955]
This study proposes a method to improve prediction accuracy by employing transfer learning techniques on the dataset with varying distributions.
The model was trained and evaluated to enhance its generalization ability and prediction accuracy.
Experiments demonstrated that this approach excels in reducing Root Mean Square Error (RMSE) and Mean Absolute Error (MAE)
The results demonstrate that freezing more layers improves performance for complex and noisy data, whereas freezing fewer layers is more effective for simpler and larger datasets.
arXiv Detail & Related papers (2024-06-01T13:09:05Z) - Enhancing Deep Knowledge Tracing via Diffusion Models for Personalized Adaptive Learning [1.2248793682283963]
This study aims to tackle data shortage issues in student learning records to enhance DKT performance for personalized adaptive learning (PAL)
It employs TabDDPM, a diffusion model, to generate synthetic educational records to augment training data for enhancing DKT.
The experimental results demonstrate that the AI-generated data by TabDDPM significantly improves DKT performance.
arXiv Detail & Related papers (2024-04-25T00:23:20Z) - Exploring Learning Complexity for Efficient Downstream Dataset Pruning [8.990878450631596]
Existing dataset pruning methods require training on the entire dataset.
We propose a straightforward, novel, and training-free hardness score named Distorting-based Learning Complexity (DLC)
Our method is motivated by the observation that easy samples learned faster can also be learned with fewer parameters.
arXiv Detail & Related papers (2024-02-08T02:29:33Z) - 3DG: A Framework for Using Generative AI for Handling Sparse Learner
Performance Data From Intelligent Tutoring Systems [22.70004627901319]
We introduce the 3DG framework (3-Dimensional tensor for Densification and Generation), a novel approach combining tensor factorization with advanced generative models.
The framework effectively generated scalable, personalized simulations of learning performance.
arXiv Detail & Related papers (2024-01-29T22:34:01Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Invariance Learning in Deep Neural Networks with Differentiable Laplace
Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation.
We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z) - Convolutional generative adversarial imputation networks for
spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods.
We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.