Knowledge Transfer by Discriminative Pre-training for Academic
Performance Prediction
- URL: http://arxiv.org/abs/2107.04009v1
- Date: Mon, 28 Jun 2021 13:02:23 GMT
- Title: Knowledge Transfer by Discriminative Pre-training for Academic
Performance Prediction
- Authors: Byungsoo Kim, Hangyeol Yu, Dongmin Shin, Youngduck Choi
- Abstract summary: We propose DPA, a transfer learning framework with Discriminative Pre-training tasks for Academic performance prediction.
Compared to the previous state-of-the-art generative pre-training method, DPA is more sample efficient, leading to fast convergence to lower academic performance prediction error.
- Score: 5.3431413737671525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The needs for precisely estimating a student's academic performance have been
emphasized with an increasing amount of attention paid to Intelligent Tutoring
System (ITS). However, since labels for academic performance, such as test
scores, are collected from outside of ITS, obtaining the labels is costly,
leading to label-scarcity problem which brings challenge in taking machine
learning approaches for academic performance prediction. To this end, inspired
by the recent advancement of pre-training method in natural language processing
community, we propose DPA, a transfer learning framework with Discriminative
Pre-training tasks for Academic performance prediction. DPA pre-trains two
models, a generator and a discriminator, and fine-tunes the discriminator on
academic performance prediction. In DPA's pre-training phase, a sequence of
interactions where some tokens are masked is provided to the generator which is
trained to reconstruct the original sequence. Then, the discriminator takes an
interaction sequence where the masked tokens are replaced by the generator's
outputs, and is trained to predict the originalities of all tokens in the
sequence. Compared to the previous state-of-the-art generative pre-training
method, DPA is more sample efficient, leading to fast convergence to lower
academic performance prediction error. We conduct extensive experimental
studies on a real-world dataset obtained from a multi-platform ITS application
and show that DPA outperforms the previous state-of-the-art generative
pre-training method with a reduction of 4.05% in mean absolute error and more
robust to increased label-scarcity.
Related papers
- Reconsidering Degeneration of Token Embeddings with Definitions for Encoder-based Pre-trained Language Models [20.107727903240065]
We propose DefinitionEMB to re-construct isotropically distributed and semantics-related token embeddings for encoder-based language models.
Our experiments demonstrate the effectiveness of leveraging definitions from Wiktionary to re-construct such embeddings.
arXiv Detail & Related papers (2024-08-02T15:00:05Z) - Federated Class-Incremental Learning with Hierarchical Generative Prototypes [10.532838477096055]
Federated Learning (FL) aims at unburdening the training of deep models by distributing computation across multiple devices (clients)
Our proposal constrains both biases in the last layer by efficiently finetuning a pre-trained backbone using learnable prompts.
Our method significantly improves the current State Of The Art, providing an average increase of +7.8% in accuracy.
arXiv Detail & Related papers (2024-06-04T16:12:27Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Augmentation-Aware Self-Supervision for Data-Efficient GAN Training [68.81471633374393]
Training generative adversarial networks (GANs) with limited data is challenging because the discriminator is prone to overfitting.
We propose a novel augmentation-aware self-supervised discriminator that predicts the augmentation parameter of the augmented data.
We compare our method with state-of-the-art (SOTA) methods using the class-conditional BigGAN and unconditional StyleGAN2 architectures.
arXiv Detail & Related papers (2022-05-31T10:35:55Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - Diversity Enhanced Active Learning with Strictly Proper Scoring Rules [4.81450893955064]
We study acquisition functions for active learning (AL) for text classification.
We convert the Expected Loss Reduction (ELR) method to estimate the increase in (strictly proper) scores like log probability or negative mean square error.
We show that the use of mean square error and log probability with BEMPS yields robust acquisition functions.
arXiv Detail & Related papers (2021-10-27T05:02:11Z) - Training ELECTRA Augmented with Multi-word Selection [53.77046731238381]
We present a new text encoder pre-training method that improves ELECTRA based on multi-task learning.
Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets.
arXiv Detail & Related papers (2021-05-31T23:19:00Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.