Less is More: Unlocking Specialization of Time Series Foundation Models via Structured Pruning
- URL: http://arxiv.org/abs/2505.23195v1
- Date: Thu, 29 May 2025 07:33:49 GMT
- Title: Less is More: Unlocking Specialization of Time Series Foundation Models via Structured Pruning
- Authors: Lifan Zhao, Yanyan Shen, Zhaoyang Liu, Xue Wang, Jiaji Deng,
- Abstract summary: Time Series Foundation Models pre-train vast parameters and achieve remarkable zero-shot forecasting performance.<n>Surprisingly, even after fine-tuning, TSFMs cannot consistently outperform smaller, specialized models trained on full-shot downstream data.<n>We propose a structured pruning method to regularize the subsequent fine-tuning process by focusing it on a more relevant and compact parameter space.
- Score: 29.377178687865136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scaling laws motivate the development of Time Series Foundation Models (TSFMs) that pre-train vast parameters and achieve remarkable zero-shot forecasting performance. Surprisingly, even after fine-tuning, TSFMs cannot consistently outperform smaller, specialized models trained on full-shot downstream data. A key question is how to realize effective adaptation of TSFMs for a target forecasting task. Through empirical studies on various TSFMs, the pre-trained models often exhibit inherent sparsity and redundancy in computation, suggesting that TSFMs have learned to activate task-relevant network substructures to accommodate diverse forecasting tasks. To preserve this valuable prior knowledge, we propose a structured pruning method to regularize the subsequent fine-tuning process by focusing it on a more relevant and compact parameter space. Extensive experiments on seven TSFMs and six benchmarks demonstrate that fine-tuning a smaller, pruned TSFM significantly improves forecasting performance compared to fine-tuning original models. This "prune-then-finetune" paradigm often enables TSFMs to achieve state-of-the-art performance and surpass strong specialized baselines.
Related papers
- Multi-Scale Finetuning for Encoder-based Time Series Foundation Models [56.503053716053]
Time series foundation models (TSFMs) demonstrate impressive zero-shot performance for time series forecasting.<n>We argue that it falls short of fully leveraging TSFMs' capabilities, often resulting in overfitting and suboptimal performance.<n>We propose textbftextscfinetextbftextsctuning (textbfMSFT), a simple yet general framework that explicitly integrates multi-scale modeling into the finetuning process.
arXiv Detail & Related papers (2025-06-17T01:06:01Z) - Can Time-Series Foundation Models Perform Building Energy Management Tasks? [5.450531952940644]
Building energy management tasks require processing and learning from a variety of time-series data.<n>Existing solutions rely on bespoke task- and data-specific models to perform these tasks.<n>Inspired by the transformative success of Large Language Models (LLMs), Time-Series Foundation Models (TSFMs) have the potential to change this.
arXiv Detail & Related papers (2025-06-12T19:45:10Z) - TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster [14.512119661418522]
We present TS-RAG, a retrieval-augmented generation framework for time series forecasting.<n>Specifically, TS-RAG leverages pre-trained time series encoders to retrieve semantically relevant segments from a dedicated knowledge base.<n>We show that TS-RAG achieves state-of-the-art zero-shot forecasting performance, outperforming the existing TSFMs by up to 6.84% across diverse domains.
arXiv Detail & Related papers (2025-03-06T16:48:48Z) - Monte Carlo Tree Diffusion for System 2 Planning [57.50512800900167]
We introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of Monte Carlo Tree Search (MCTS)<n>Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined.
arXiv Detail & Related papers (2025-02-11T02:51:42Z) - Investigating Compositional Reasoning in Time Series Foundation Models [16.421597202235112]
We study the impact of TSFM architecture design on compositional reasoning and generalization.<n>We find that patch-based Transformers have the best reasoning performance.<n>In some zero-shot out-of-distribution scenarios, these models can outperform moving average and exponential smoothing statistical baselines trained on in-distribution data.
arXiv Detail & Related papers (2025-02-09T21:21:55Z) - Time Series Foundational Models: Their Role in Anomaly Detection and Prediction [0.0]
Time series foundational models (TSFM) have gained prominence in time series forecasting.<n>This paper critically evaluates the efficacy of TSFM in anomaly detection and prediction tasks.
arXiv Detail & Related papers (2024-12-26T17:15:30Z) - Specialized Foundation Models Struggle to Beat Supervised Baselines [60.23386520331143]
We look at three modalities -- genomics, satellite imaging, and time series -- with multiple recent FMs and compare them to a standard supervised learning workflow.<n>We find that it is consistently possible to train simple supervised models that match or even outperform the latest foundation models.
arXiv Detail & Related papers (2024-11-05T04:10:59Z) - Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models [68.23649978697027]
Forecast-PEFT is a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters.
Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks.
Forecast-FT further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods.
arXiv Detail & Related papers (2024-07-28T19:18:59Z) - Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting [46.63798583414426]
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis.
Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation.
Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks.
arXiv Detail & Related papers (2024-01-22T13:15:40Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.