FlowDistill: Scalable Traffic Flow Prediction via Distillation from LLMs
- URL: http://arxiv.org/abs/2504.02094v1
- Date: Wed, 02 Apr 2025 19:54:54 GMT
- Title: FlowDistill: Scalable Traffic Flow Prediction via Distillation from LLMs
- Authors: Chenyang Yu, Xinpeng Xie, Yan Huang, Chenxi Qiu,
- Abstract summary: FlowDistill is a lightweight traffic prediction framework based on knowledge distillation from large language models (LLMs)<n>Despite its simplicity, FlowDistill consistently outperforms state-of-the-art models in prediction accuracy while requiring significantly less training data.
- Score: 5.6685153523382015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate traffic flow prediction is vital for optimizing urban mobility, yet it remains difficult in many cities due to complex spatio-temporal dependencies and limited high-quality data. While deep graph-based models demonstrate strong predictive power, their performance often comes at the cost of high computational overhead and substantial training data requirements, making them impractical for deployment in resource-constrained or data-scarce environments. We propose the FlowDistill, a lightweight and scalable traffic prediction framework based on knowledge distillation from large language models (LLMs). In this teacher-student setup, a fine-tuned LLM guides a compact multi-layer perceptron (MLP) student model using a novel combination of the information bottleneck principle and teacher-bounded regression loss, ensuring the distilled model retains only essential and transferable knowledge. Spatial and temporal correlations are explicitly encoded to enhance the model's generalization across diverse urban settings. Despite its simplicity, FlowDistill consistently outperforms state-of-the-art models in prediction accuracy while requiring significantly less training data, and achieving lower memory usage and inference latency, highlighting its efficiency and suitability for real-world, scalable deployment.
Related papers
- PreMixer: MLP-Based Pre-training Enhanced MLP-Mixers for Large-scale Traffic Forecasting [30.055634767677823]
In urban computing, precise and swift forecasting of time series data from traffic networks is crucial.<n>Current research limitations because of inherent inefficiency of model and their unsuitability for large-scale traffic applications due to model complexity.<n>This paper proposes a novel framework, named PreMixer, designed to bridge this gap. It features a predictive model and a pre-training mechanism, both based on the principles of Multi-Layer Perceptrons (MLP)<n>Our framework achieves comparable state-of-theart performance while maintaining high computational efficiency, as verified by extensive experiments on large-scale traffic datasets.
arXiv Detail & Related papers (2024-12-18T08:35:40Z) - Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data [54.934578742209716]
In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets.
LLKD is an adaptive sample selection method that incorporates signals from both the teacher and student.
Our comprehensive experiments show that LLKD achieves superior performance across various datasets with higher data efficiency.
arXiv Detail & Related papers (2024-11-12T18:57:59Z) - Efficient Motion Prediction: A Lightweight & Accurate Trajectory Prediction Model With Fast Training and Inference Speed [56.27022390372502]
We propose a new efficient motion prediction model, which achieves highly competitive benchmark results while training only a few hours on a single GPU.
Its low inference latency makes it particularly suitable for deployment in autonomous applications with limited computing resources.
arXiv Detail & Related papers (2024-09-24T14:58:27Z) - EasyST: A Simple Framework for Spatio-Temporal Prediction [18.291117879544945]
We propose a simple framework for spatial-temporal prediction - EasyST paradigm.
It learns lightweight and robust Multi-Layer Perceptrons (MLPs) generalization by distilling knowledge from complex-temporal GNNs.
EasyST surpasses state-of-the-art approaches in terms of efficiency and accuracy.
arXiv Detail & Related papers (2024-09-10T11:40:01Z) - Physics-guided Active Sample Reweighting for Urban Flow Prediction [75.24539704456791]
Urban flow prediction is a nuanced-temporal modeling that estimates the throughput of transportation services like buses, taxis and ride-driven models.
Some recent prediction solutions bring remedies with the notion of physics-guided machine learning (PGML)
We develop a atized physics-guided network (PN), and propose a data-aware framework Physics-guided Active Sample Reweighting (P-GASR)
arXiv Detail & Related papers (2024-07-18T15:44:23Z) - ST-Mamba: Spatial-Temporal Selective State Space Model for Traffic Flow Prediction [32.44888387725925]
The proposed ST-Mamba model is first to leverage the power of spatial-temporal learning in traffic flow prediction without using graph modeling.
The proposed ST-Mamba model achieves a 61.11% improvement in computational speed and increases prediction accuracy by 0.67%.
Experiments with real-world traffic datasets demonstrate that the textsfST-Mamba model sets a new benchmark in traffic flow prediction.
arXiv Detail & Related papers (2024-04-20T03:57:57Z) - TPLLM: A Traffic Prediction Framework Based on Pretrained Large Language Models [27.306180426294784]
We introduce TPLLM, a novel traffic prediction framework leveraging Large Language Models (LLMs)
In this framework, we construct a sequence embedding layer based on Conal Neural Networks (LoCNNs) and a graph embedding layer based on Graph Contemporalal Networks (GCNs) to extract sequence features and spatial features.
Experiments on two real-world datasets demonstrate commendable performance in both full-sample and few-shot prediction scenarios.
arXiv Detail & Related papers (2024-03-04T17:08:57Z) - Adaptive Model Pruning and Personalization for Federated Learning over
Wireless Networks [72.59891661768177]
Federated learning (FL) enables distributed learning across edge devices while protecting data privacy.
We consider a FL framework with partial model pruning and personalization to overcome these challenges.
This framework splits the learning model into a global part with model pruning shared with all devices to learn data representations and a personalized part to be fine-tuned for a specific device.
arXiv Detail & Related papers (2023-09-04T21:10:45Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.