Related papers: Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

URL: http://arxiv.org/abs/2505.24003v2
Date: Fri, 31 Oct 2025 01:29:58 GMT
Title: Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting
Authors: ChengAo Shen, Wenchao Yu, Ziming Zhao, Dongjin Song, Wei Cheng, Haifeng Chen, Jingchao Ni,
Abstract summary: Time series can be transformed into images and texts, offering multi-modal views (MMVs) of the same underlying signal.<n>These MMVs can reveal complementary patterns and enable the use of powerful pre-trained large models, such as large vision models (LVMs) for long-term time series forecasting (LTSF)<n>We propose DMMV, a novel decomposition-based multi-modal view framework that leverages trend-seasonal decomposition and a novel backcast-residual based adaptive decomposition to integrate MMVs for LTSF.
Score: 53.332533610841885
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Time series, typically represented as numerical sequences, can also be transformed into images and texts, offering multi-modal views (MMVs) of the same underlying signal. These MMVs can reveal complementary patterns and enable the use of powerful pre-trained large models, such as large vision models (LVMs), for long-term time series forecasting (LTSF). However, as we identified in this work, the state-of-the-art (SOTA) LVM-based forecaster poses an inductive bias towards "forecasting periods". To harness this bias, we propose DMMV, a novel decomposition-based multi-modal view framework that leverages trend-seasonal decomposition and a novel backcast-residual based adaptive decomposition to integrate MMVs for LTSF. Comparative evaluations against 14 SOTA models across diverse datasets show that DMMV outperforms single-view and existing multi-modal baselines, achieving the best mean squared error (MSE) on 6 out of 8 benchmark datasets. The code for this paper is available at: https://github.com/D2I-Group/dmmv.

Related papers

DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters [50.43534351968113]
Existing generative time series models do not address the multi-dimensional properties of time series data well.<n>Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS)
arXiv Detail & Related papers (2026-02-06T10:48:13Z)
Analyzing Diffusion and Autoregressive Vision Language Models in Multimodal Embedding Space [52.34072027212278]
Embedding models are a fundamental component of modern AI systems such as semantic search and retrieval-augmented generation.<n>Recent advances in large foundation models have substantially accelerated the development of embedding models.<n>We present the first systematic study of converting Multimodal dLLMs into embedding models.
arXiv Detail & Related papers (2026-01-19T06:51:15Z)
Enhancing few-shot time series forecasting with LLM-guided diffusion [12.286204074670236]
Time series forecasting in specialized domains is often constrained by limited data availability.<n>We propose LTSM-DIFF, a novel learning framework that integrates the expressive power of large language models with the generative capability of diffusion models.<n>Our work establishes a new paradigm for time series analysis under data scarcity.
arXiv Detail & Related papers (2026-01-19T06:30:05Z)
VIFO: Visual Feature Empowered Multivariate Time Series Forecasting with Cross-Modal Fusion [30.95449991386488]
We propose VIFO, a cross-temporal forecasting model, for time series models.<n>It renders multivariate time series into image, enabling pre-trained LVM to extract complex cross-channel patterns.<n>It achieves competitive performance on multiple benchmarks.
arXiv Detail & Related papers (2025-09-25T14:02:26Z)
VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Visual Backbones [27.97547118858576]
We propose VisionTS++, a vision-model-based TSFM that performs continual pre-training on large-scale time series datasets.<n>Our work establishes a new paradigm for cross-modal knowledge transfer, advancing the development of universal TSFMs.
arXiv Detail & Related papers (2025-08-06T12:17:09Z)
From Images to Signals: Are Large Vision Models Useful for Time Series Analysis? [62.58235852194057]
Transformer-based models have gained increasing attention in time series research.<n>As the field moves toward multi-modality, Large Vision Models (LVMs) are emerging as a promising direction.
arXiv Detail & Related papers (2025-05-29T22:05:28Z)
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding [67.24430397016275]
We propose a new early-fusion LMM that can fuse multi-modal inputs in the early stage and respond to visual instructions in an auto-regressive manner.<n>The proposed model demonstrates superior performance compared to other LMMs using one transformer and significantly narrows the performance gap with compositional LMMs.
arXiv Detail & Related papers (2025-03-12T06:01:05Z)
Vision-Enhanced Time Series Forecasting via Latent Diffusion Models [12.54316645614762]
LDM4TS is a novel framework that leverages the powerful image reconstruction capabilities of latent diffusion models for vision-enhanced time series forecasting.<n>We are the first to use complementary transformation techniques to convert time series into multi-view visual representations.
arXiv Detail & Related papers (2025-02-16T14:15:06Z)
Harnessing Vision Models for Time Series Analysis: A Survey [72.09716244582684]
This survey discusses the advantages of vision models over LLMs in time series analysis.<n>It provides a comprehensive and in-depth overview of the existing methods, with dual views of detailed taxonomy.<n>We address the challenges in the pre- and post-processing steps involved in this framework.
arXiv Detail & Related papers (2025-02-13T00:42:11Z)
M-CELS: Counterfactual Explanation for Multivariate Time Series Data Guided by Learned Saliency Maps [0.9374652839580181]
We introduce M-CELS, a counterfactual explanation model designed to enhance interpretability in multidimensional time series classification tasks. Results demonstrate the superior performance of M-CELS in terms of validity, proximity, and sparsity.
arXiv Detail & Related papers (2024-11-04T22:16:24Z)
MMFNet: Multi-Scale Frequency Masking Neural Network for Multivariate Time Series Forecasting [6.733646592789575]
Long-term Time Series Forecasting (LTSF) is critical for numerous real-world applications, such as electricity consumption planning, financial forecasting, and disease propagation analysis. We introduce MMFNet, a novel model designed to enhance long-term multivariate forecasting by leveraging a multi-scale masked frequency decomposition approach. MMFNet captures fine, intermediate, and coarse-grained temporal patterns by converting time series into frequency segments at varying scales while employing a learnable mask to filter out irrelevant components adaptively.
arXiv Detail & Related papers (2024-10-02T22:38:20Z)
LTSM-Bundle: A Toolbox and Benchmark on Large Language Models for Time Series Forecasting [69.33802286580786]
We introduce LTSM-Bundle, a comprehensive toolbox, and benchmark for training LTSMs.<n>It modularized and benchmarked LTSMs from multiple dimensions, encompassing prompting strategies, tokenization approaches, base model selection, data quantity, and dataset diversity.<n> Empirical results demonstrate that this combination achieves superior zero-shot and few-shot performances compared to state-of-the-art LTSMs and traditional TSF methods.
arXiv Detail & Related papers (2024-06-20T07:09:19Z)
PDETime: Rethinking Long-Term Multivariate Time Series Forecasting from the perspective of partial differential equations [49.80959046861793]
We present PDETime, a novel LMTF model inspired by the principles of Neural PDE solvers. Our experimentation across seven diversetemporal real-world LMTF datasets reveals that PDETime adapts effectively to the intrinsic nature of the data.
arXiv Detail & Related papers (2024-02-25T17:39:44Z)
A novel decomposed-ensemble time series forecasting framework: capturing underlying volatility information [6.590038231008498]
We propose a novel time series forecasting paradigm that integrates decomposition with the capability to capture the underlying fluctuation information of the series. Both the numerical data and the volatility information for each sub-mode are harnessed to train a neural network. This network is adept at predicting the information of the sub-modes, and we aggregate the predictions of all sub-modes to generate the final output.
arXiv Detail & Related papers (2023-10-13T01:50:43Z)
Multi-scale Attention Flow for Probabilistic Time Series Forecasting [68.20798558048678]
We propose a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF) Our model avoids the influence of cumulative error and does not increase the time complexity. Our model achieves state-of-the-art performance on many popular multivariate datasets.
arXiv Detail & Related papers (2022-05-16T07:53:42Z)
Multivariate time-series modeling with generative neural networks [0.0]
Generative moment matching networks (GMMNs) are introduced as dependence models for the joint innovation distribution of multivariate time series (MTS) GMMNs are highly flexible and easy to simulate from, which is a major advantage over the copula-GARCH approach.
arXiv Detail & Related papers (2020-02-25T03:26:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.