Related papers: No Length Left Behind: Enhancing Knowledge Tracing for Modeling Sequences of Excessive or Insufficient Lengths

No Length Left Behind: Enhancing Knowledge Tracing for Modeling Sequences of Excessive or Insufficient Lengths

URL: http://arxiv.org/abs/2308.03488v1
Date: Mon, 7 Aug 2023 11:30:58 GMT
Title: No Length Left Behind: Enhancing Knowledge Tracing for Modeling Sequences of Excessive or Insufficient Lengths
Authors: Moyu Zhang, Xinning Zhu, Chunhong Zhang, Feng Pan, Wenchen Qian, Hui Zhao
Abstract summary: Knowledge tracing aims to predict students' responses to practices based on their historical question-answering behaviors. As sequences get longer, computational costs will increase exponentially. We propose a model called Sequence-Flexible Knowledge Tracing (SFKT)
Score: 3.2687390531088414
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge tracing (KT) aims to predict students' responses to practices based on their historical question-answering behaviors. However, most current KT methods focus on improving overall AUC, leaving ample room for optimization in modeling sequences of excessive or insufficient lengths. As sequences get longer, computational costs will increase exponentially. Therefore, KT methods usually truncate sequences to an acceptable length, which makes it difficult for models on online service systems to capture complete historical practice behaviors of students with too long sequences. Conversely, modeling students with short practice sequences using most KT methods may result in overfitting due to limited observation samples. To address the above limitations, we propose a model called Sequence-Flexible Knowledge Tracing (SFKT).

Related papers

Breaking the Context Bottleneck on Long Time Series Forecasting [6.36010639533526]
Long-term time-series forecasting is essential for planning and decision-making in economics, energy, and transportation. Recent advancements have enhanced the efficiency of these models, but the challenge of effectively leveraging longer sequences persists. We propose the Logsparse Decomposable Multiscaling (LDM) framework for the efficient and effective processing of long sequences.
arXiv Detail & Related papers (2024-12-21T10:29:34Z)
AdaCred: Adaptive Causal Decision Transformers with Feature Crediting [11.54181863246064]
We introduce AdaCred, a novel approach that represents trajectories as causal graphs built from short-term action-reward-state sequences. Our experiments demonstrate that AdaCred-based policies require shorter trajectory sequences and consistently outperform conventional methods in both offline reinforcement learning and imitation learning environments.
arXiv Detail & Related papers (2024-12-19T22:22:37Z)
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models [16.060402139507644]
SWITCH (Studying WIth TeaCHer for Knowledge Distillation) is a novel approach that strategically incorporates the teacher model during the student's sequence generation. We show that SWITCH surpasses traditional Knowledge Distillation methods, particularly excelling in the generation of long sequential data.
arXiv Detail & Related papers (2024-10-25T12:10:49Z)
ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially. Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks. However, such methods lack theoretical guarantees, making them prone to unexpected failures. We bridge this gap by integrating an empirically strong approach into a principled framework, designed to prevent forgetting.
arXiv Detail & Related papers (2024-10-01T12:58:37Z)
SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training [68.7896349660824]
We present an in-depth analysis of the progressive overfitting problem from the lens of Seq FT. Considering that the overly fast representation learning and the biased classification layer constitute this particular problem, we introduce the advanced Slow Learner with Alignment (S++) framework. Our approach involves a Slow Learner to selectively reduce the learning rate of backbone parameters, and a Alignment to align the disjoint classification layers in a post-hoc fashion.
arXiv Detail & Related papers (2024-08-15T17:50:07Z)
CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling [52.404072802235234]
We introduce Chunked Instruction-aware State Eviction (CItruS), a novel modeling technique that integrates the attention preferences useful for a downstream task into the eviction process of hidden states. Our training-free method exhibits superior performance on long sequence comprehension and retrieval tasks over several strong baselines under the same memory budget.
arXiv Detail & Related papers (2024-06-17T18:34:58Z)
Long Range Propagation on Continuous-Time Dynamic Graphs [18.5534584418248]
Continuous-Time Graph Anti-Symmetric Network (CTAN) is designed for efficient propagation of information. We show how CTAN's empirical performance on synthetic long-range benchmarks and real-world benchmarks is superior to other methods.
arXiv Detail & Related papers (2024-06-04T19:42:19Z)
Mitigating Catastrophic Forgetting in Task-Incremental Continual Learning with Adaptive Classification Criterion [50.03041373044267]
We propose a Supervised Contrastive learning framework with adaptive classification criterion for Continual Learning. Experiments show that CFL achieves state-of-the-art performance and has a stronger ability to overcome compared with the classification baselines.
arXiv Detail & Related papers (2023-05-20T19:22:40Z)
HiPool: Modeling Long Documents Using Graph Neural Networks [24.91040673099863]
Long sequences in Natural Language Processing (NLP) are a challenging problem. Recent pretraining language models achieve satisfying performances in many NLP tasks. We propose a new challenging benchmark, totaling six datasets with up to 53k samples and 4034 average tokens' length.
arXiv Detail & Related papers (2023-05-05T06:58:24Z)
FiLM: Frequency improved Legendre Memory Model for Long-term Time Series Forecasting [22.821606402558707]
We develop a textbfFrequency textbfimproved textbfLegendre textbfMemory model, or bf FiLM, to handle the dilemma between accurately preserving historical information and reducing the impact of noisy signals in the past. Our empirical studies show that the proposed FiLM improves the accuracy of state-of-the-art models by a significant margin.
arXiv Detail & Related papers (2022-05-18T12:37:54Z)
AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS) Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.