No Length Left Behind: Enhancing Knowledge Tracing for Modeling
Sequences of Excessive or Insufficient Lengths
- URL: http://arxiv.org/abs/2308.03488v1
- Date: Mon, 7 Aug 2023 11:30:58 GMT
- Title: No Length Left Behind: Enhancing Knowledge Tracing for Modeling
Sequences of Excessive or Insufficient Lengths
- Authors: Moyu Zhang, Xinning Zhu, Chunhong Zhang, Feng Pan, Wenchen Qian, Hui
Zhao
- Abstract summary: Knowledge tracing aims to predict students' responses to practices based on their historical question-answering behaviors.
As sequences get longer, computational costs will increase exponentially.
We propose a model called Sequence-Flexible Knowledge Tracing (SFKT)
- Score: 3.2687390531088414
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge tracing (KT) aims to predict students' responses to practices based
on their historical question-answering behaviors. However, most current KT
methods focus on improving overall AUC, leaving ample room for optimization in
modeling sequences of excessive or insufficient lengths. As sequences get
longer, computational costs will increase exponentially. Therefore, KT methods
usually truncate sequences to an acceptable length, which makes it difficult
for models on online service systems to capture complete historical practice
behaviors of students with too long sequences. Conversely, modeling students
with short practice sequences using most KT methods may result in overfitting
due to limited observation samples. To address the above limitations, we
propose a model called Sequence-Flexible Knowledge Tracing (SFKT).
Related papers
- SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models [16.060402139507644]
SWITCH (Studying WIth TeaCHer for Knowledge Distillation) is a novel approach that strategically incorporates the teacher model during the student's sequence generation.
We show that SWITCH surpasses traditional Knowledge Distillation methods, particularly excelling in the generation of long sequential data.
arXiv Detail & Related papers (2024-10-25T12:10:49Z) - ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially.
Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks.
However, such methods lack theoretical guarantees, making them prone to unexpected failures.
We bridge this gap by integrating an empirically strong approach into a principled framework, designed to prevent forgetting.
arXiv Detail & Related papers (2024-10-01T12:58:37Z) - SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training [68.7896349660824]
We present an in-depth analysis of the progressive overfitting problem from the lens of Seq FT.
Considering that the overly fast representation learning and the biased classification layer constitute this particular problem, we introduce the advanced Slow Learner with Alignment (S++) framework.
Our approach involves a Slow Learner to selectively reduce the learning rate of backbone parameters, and a Alignment to align the disjoint classification layers in a post-hoc fashion.
arXiv Detail & Related papers (2024-08-15T17:50:07Z) - CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling [52.404072802235234]
We introduce Chunked Instruction-aware State Eviction (CItruS), a novel modeling technique that integrates the attention preferences useful for a downstream task into the eviction process of hidden states.
Our training-free method exhibits superior performance on long sequence comprehension and retrieval tasks over several strong baselines under the same memory budget.
arXiv Detail & Related papers (2024-06-17T18:34:58Z) - Long Range Propagation on Continuous-Time Dynamic Graphs [18.5534584418248]
Continuous-Time Graph Anti-Symmetric Network (CTAN) is designed for efficient propagation of information.
We show how CTAN's empirical performance on synthetic long-range benchmarks and real-world benchmarks is superior to other methods.
arXiv Detail & Related papers (2024-06-04T19:42:19Z) - Mitigating Catastrophic Forgetting in Task-Incremental Continual
Learning with Adaptive Classification Criterion [50.03041373044267]
We propose a Supervised Contrastive learning framework with adaptive classification criterion for Continual Learning.
Experiments show that CFL achieves state-of-the-art performance and has a stronger ability to overcome compared with the classification baselines.
arXiv Detail & Related papers (2023-05-20T19:22:40Z) - HiPool: Modeling Long Documents Using Graph Neural Networks [24.91040673099863]
Long sequences in Natural Language Processing (NLP) are a challenging problem.
Recent pretraining language models achieve satisfying performances in many NLP tasks.
We propose a new challenging benchmark, totaling six datasets with up to 53k samples and 4034 average tokens' length.
arXiv Detail & Related papers (2023-05-05T06:58:24Z) - FiLM: Frequency improved Legendre Memory Model for Long-term Time Series
Forecasting [22.821606402558707]
We develop a textbfFrequency textbfimproved textbfLegendre textbfMemory model, or bf FiLM, to handle the dilemma between accurately preserving historical information and reducing the impact of noisy signals in the past.
Our empirical studies show that the proposed FiLM improves the accuracy of state-of-the-art models by a significant margin.
arXiv Detail & Related papers (2022-05-18T12:37:54Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.