Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting
- URL: http://arxiv.org/abs/2507.02939v2
- Date: Sun, 20 Jul 2025 17:02:56 GMT
- Title: Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting
- Authors: Yuqi Li, Chuanguang Yang, Hansheng Zeng, Zeyu Dong, Zhulin An, Yongjun Xu, Yingli Tian, Hao Wu,
- Abstract summary: This paper proposes a framework, Spectral Decoupled Knowledge Distillation (termed SDKD), which transfers the multi-scale representations from a complex teacher model to a more efficient lightweight student network.<n>The framework effectively captures both high-frequency variations and long-term trends while reducing computational complexity.
- Score: 37.00869900861736
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spatiotemporal forecasting tasks, such as traffic flow, combustion dynamics, and weather forecasting, often require complex models that suffer from low training efficiency and high memory consumption. This paper proposes a lightweight framework, Spectral Decoupled Knowledge Distillation (termed SDKD), which transfers the multi-scale spatiotemporal representations from a complex teacher model to a more efficient lightweight student network. The teacher model follows an encoder-latent evolution-decoder architecture, where its latent evolution module decouples high-frequency details and low-frequency trends using convolution and Transformer (global low-frequency modeler). However, the multi-layer convolution and deconvolution structures result in slow training and high memory usage. To address these issues, we propose a frequency-aligned knowledge distillation strategy, which extracts multi-scale spectral features from the teacher's latent space, including both high and low frequency components, to guide the lightweight student model in capturing both local fine-grained variations and global evolution patterns. Experimental results show that SDKD significantly improves performance, achieving reductions of up to 81.3% in MSE and in MAE 52.3% on the Navier-Stokes equation dataset. The framework effectively captures both high-frequency variations and long-term trends while reducing computational complexity. Our codes are available at https://github.com/itsnotacie/SDKD
Related papers
- LOGLO-FNO: Efficient Learning of Local and Global Features in Fourier Neural Operators [20.77877474840923]
High-frequency information is a critical challenge in machine learning.<n>Deep neural nets exhibit the so-called spectral bias toward learning low-frequency components.<n>We propose a novel frequency-sensitive loss term based on radially binned spectral errors.
arXiv Detail & Related papers (2025-04-05T19:35:04Z) - FlowDistill: Scalable Traffic Flow Prediction via Distillation from LLMs [5.6685153523382015]
FlowDistill is a lightweight traffic prediction framework based on knowledge distillation from large language models (LLMs)<n>Despite its simplicity, FlowDistill consistently outperforms state-of-the-art models in prediction accuracy while requiring significantly less training data.
arXiv Detail & Related papers (2025-04-02T19:54:54Z) - BEAT: Balanced Frequency Adaptive Tuning for Long-Term Time-Series Forecasting [46.922741972636025]
Time-series forecasting is crucial for numerous real-world applications including weather prediction and financial market modeling.<n>We propose BEAT (Balanced frEquency Adaptive Tuning), a novel framework that monitors the training status for each frequency and adaptively adjusts their gradient updates.<n>BEAT consistently outperforms state-of-the-art approaches in experiments on seven real-world datasets.
arXiv Detail & Related papers (2025-01-31T11:52:35Z) - Incremental Spatial and Spectral Learning of Neural Operators for
Solving Large-Scale PDEs [86.35471039808023]
We introduce the Incremental Fourier Neural Operator (iFNO), which progressively increases the number of frequency modes used by the model.
We show that iFNO reduces total training time while maintaining or improving generalization performance across various datasets.
Our method demonstrates a 10% lower testing error, using 20% fewer frequency modes compared to the existing Fourier Neural Operator, while also achieving a 30% faster training.
arXiv Detail & Related papers (2022-11-28T09:57:15Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z) - Flatter, faster: scaling momentum for optimal speedup of SGD [0.0]
We study training dynamics arising from interplay between gradient descent (SGD) and label noise and momentum in the training of neural networks.
We find that scaling the momentum hyper parameter $1-NISTbeta$ with the learning rate to the power of $2/3$ maximally accelerates training, without sacrificing generalization.
arXiv Detail & Related papers (2022-10-28T20:41:48Z) - Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision.
This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z) - Inception Transformer [151.939077819196]
Inception Transformer, or iFormer, learns comprehensive features with both high- and low-frequency information in visual data.
We benchmark the iFormer on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection and ADE20K segmentation.
arXiv Detail & Related papers (2022-05-25T17:59:54Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.