Related papers: FlatFormer: A Flat Transformer Knowledge Tracing Model Based on Cognitive Bias Injection

FlatFormer: A Flat Transformer Knowledge Tracing Model Based on Cognitive Bias Injection

URL: http://arxiv.org/abs/2512.06629v1
Date: Sun, 07 Dec 2025 02:32:10 GMT
Title: FlatFormer: A Flat Transformer Knowledge Tracing Model Based on Cognitive Bias Injection
Authors: Xiao-li Xia, Hou-biao Li,
Abstract summary: Knowledge Tracing models face a critical Performance-Complexity Trap''<n>We propose FlatFormer, a streamlined architecture based on the novel design paradigm of Information Injection over Structural Stacking''<n>Experiments on four large-scale datasets show that FlatFormer achieves state-of-the-art performance.
Score: 0.5729426778193398
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge Tracing (KT) models face a critical ``Performance-Complexity Trap'': capturing complex cognitive dynamics like learning sessions and memory decay typically requires deep hierarchical architectures, which incur prohibitive computational costs for real-time deployment. To resolve this, we propose FlatFormer, a streamlined architecture based on the novel design paradigm of ``Information Injection over Structural Stacking.'' Unlike parameter-heavy hierarchical models, FlatFormer leverages a standard flat Transformer augmented with two lightweight injection mechanisms: (i) a hybrid input encoding strategy combining learnable session identifiers with fixed sinusoidal step embeddings; and (ii) a pre-computed power-law bias integrated directly into attention logits to explicitly model the forgetting curve. Extensive experiments on four large-scale datasets (e.g., EdNet, Junyi) show that FlatFormer achieves state-of-the-art performance. For example, on the EdNet dataset, compared to the strongest hierarchical baseline (HiTSKT), its absolute AUC increased by 8.3%, while using less than 15% of parameters, and inference speed was about three times faster. These results validate that high cognitive fidelity does not necessitate architectural complexity.

Related papers

StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models [98.72926158261937]
We propose a training-free token pruning framework for Visual AutoRegressive models.<n>We employ a lightweight high-pass filter to capture local texture details, while leveraging Principal Component Analysis (PCA) to preserve global structural information.<n>To maintain valid next-scale prediction under sparse tokens, we introduce a nearest neighbor feature propagation strategy.
arXiv Detail & Related papers (2026-03-02T11:35:05Z)
NoiseFormer -- Noise Diffused Symmetric Attention Transformer [0.0]
We propose a novel unified model architecture called Noise Diffused Symmetric Attention Transformer to enhance the model's performance.<n>The proposed model is validated upon GPT2 base model and the results reflect the performance gains falling between plain Symmetric attention and GPT2 base model.
arXiv Detail & Related papers (2026-01-10T14:10:48Z)
FilletRec: A Lightweight Graph Neural Network with Intrinsic Features for Automated Fillet Recognition [2.402309979435103]
This paper proposes an end-to-end, data-driven framework specifically for fillet features.<n>We first construct and release a large-scale, diverse benchmark dataset for fillet recognition.<n>We then propose FilletRec, a lightweight graph neural network.<n> Experiments show that FilletRec surpasses state-of-the-art methods in both accuracy and generalization.
arXiv Detail & Related papers (2025-11-04T02:27:18Z)
Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution [88.20464308588889]
We propose a Structural Similarity-Inspired Unfolding (SSIU) method for efficient image SR.<n>This method is designed through unfolding an SR optimization function constrained by structural similarity.<n>Our model outperforms current state-of-the-art models, boasting lower parameter counts and reduced memory consumption.
arXiv Detail & Related papers (2025-06-13T14:29:40Z)
Large EEG-U-Transformer for Time-Step Level Detection Without Pre-Training [1.3254304182988286]
We propose a simple U-shaped model to efficiently learn representations by capturing both local and global features.<n>Compared to other window-level classification models, our method directly outputs predictions at the time-step level.<n>Our model won 1st place in the 2025 "seizure detection challenge" organized in the International Conference on Artificial Intelligence in Epilepsy and Other Neurological Disorders.
arXiv Detail & Related papers (2025-04-01T01:33:42Z)
Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration [2.814748676983944]
We propose a graph neural network model embedded with a local Spherical Euclidean 3D equivariance property through SE(3) message passing based propagation. Our model is composed mainly of a descriptor module, equivariant graph layers, match similarity, and the final regression layers. Experiments conducted on the 3DMatch and KITTI datasets exhibit the compelling and robust performance of our model compared to state-of-the-art approaches.
arXiv Detail & Related papers (2024-10-08T06:48:01Z)
Phantom Embeddings: Using Embedding Space for Model Regularization in Deep Neural Networks [12.293294756969477]
The strength of machine learning models stems from their ability to learn complex function approximations from data. The complex models tend to memorize the training data, which results in poor regularization performance on test data. We present a novel approach to regularize the models by leveraging the information-rich latent embeddings and their high intra-class correlation.
arXiv Detail & Related papers (2023-04-14T17:15:54Z)
Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO) MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts. Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z)
Adversarial Audio Synthesis with Complex-valued Polynomial Networks [60.231877895663956]
Time-frequency (TF) representations in audio have been increasingly modeled real-valued networks. We introduce complex-valued networks called APOLLO, that integrate such complex-valued representations in a natural way. APOLLO results in $17.5%$ improvement over adversarial methods and $8.2%$ over the state-of-the-art diffusion models on SC09 in audio generation.
arXiv Detail & Related papers (2022-06-14T12:58:59Z)
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition [99.349598600887]
Conformer is the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture. We propose the Squeezeformer model, which consistently outperforms the state-of-the-art ASR models under the same training schemes.
arXiv Detail & Related papers (2022-06-02T06:06:29Z)
Revealing the Invisible with Model and Data Shrinking for Composite-database Micro-expression Recognition [49.463864096615254]
We analyze the influence of learning complexity, including the input complexity and model complexity. We propose a recurrent convolutional network (RCN) to explore the shallower-architecture and lower-resolution input data. We develop three parameter-free modules to integrate with RCN without increasing any learnable parameters.
arXiv Detail & Related papers (2020-06-17T06:19:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.