VIFO: Visual Feature Empowered Multivariate Time Series Forecasting with Cross-Modal Fusion
- URL: http://arxiv.org/abs/2510.03244v1
- Date: Thu, 25 Sep 2025 14:02:26 GMT
- Title: VIFO: Visual Feature Empowered Multivariate Time Series Forecasting with Cross-Modal Fusion
- Authors: Yanlong Wang, Hang Yu, Jian Xu, Fei Ma, Hongkang Zhang, Tongtong Feng, Zijian Zhang, Shao-Lun Huang, Danny Dongning Sun, Xiao-Ping Zhang,
- Abstract summary: We propose VIFO, a cross-temporal forecasting model, for time series models.<n>It renders multivariate time series into image, enabling pre-trained LVM to extract complex cross-channel patterns.<n>It achieves competitive performance on multiple benchmarks.
- Score: 30.95449991386488
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large time series foundation models often adopt channel-independent architectures to handle varying data dimensions, but this design ignores crucial cross-channel dependencies. Concurrently, existing multimodal approaches have not fully exploited the power of large vision models (LVMs) to interpret spatiotemporal data. Additionally, there remains significant unexplored potential in leveraging the advantages of information extraction from different modalities to enhance time series forecasting performance. To address these gaps, we propose the VIFO, a cross-modal forecasting model. VIFO uniquely renders multivariate time series into image, enabling pre-trained LVM to extract complex cross-channel patterns that are invisible to channel-independent models. These visual features are then aligned and fused with representations from the time series modality. By freezing the LVM and training only 7.45% of its parameters, VIFO achieves competitive performance on multiple benchmarks, offering an efficient and effective solution for capturing cross-variable relationships in
Related papers
- DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters [50.43534351968113]
Existing generative time series models do not address the multi-dimensional properties of time series data well.<n>Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS)
arXiv Detail & Related papers (2026-02-06T10:48:13Z) - UniDiff: A Unified Diffusion Framework for Multimodal Time Series Forecasting [90.47915032778366]
We propose UniDiff, a unified diffusion framework for multimodal time series forecasting.<n>At its core lies a unified and parallel fusion module, where a single cross-attention mechanism integrates structural information from timestamps and semantic context from texts.<n>Experiments on real-world benchmark datasets across eight domains demonstrate that the proposed UniDiff model achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-08T05:36:14Z) - DiM-TS: Bridge the Gap between Selective State Space Models and Time Series for Generative Modeling [11.836475971106125]
Time series data plays a pivotal role in a wide variety of fields but faces challenges related to privacy concerns.<n>We propose Lag Fusion Mamba and Permutation Scanning Mamba, which enhance the model's ability to discern significant patterns during the denoising process.<n>We also introduce Diffusion Mamba for Time Series (DiM-TS), a high-quality time series generation model that better preserves the temporal periodicity and inter-channel correlations.
arXiv Detail & Related papers (2025-11-23T06:48:03Z) - From Images to Signals: Are Large Vision Models Useful for Time Series Analysis? [62.58235852194057]
Transformer-based models have gained increasing attention in time series research.<n>As the field moves toward multi-modality, Large Vision Models (LVMs) are emerging as a promising direction.
arXiv Detail & Related papers (2025-05-29T22:05:28Z) - Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting [53.332533610841885]
Time series can be transformed into images and texts, offering multi-modal views (MMVs) of the same underlying signal.<n>These MMVs can reveal complementary patterns and enable the use of powerful pre-trained large models, such as large vision models (LVMs) for long-term time series forecasting (LTSF)<n>We propose DMMV, a novel decomposition-based multi-modal view framework that leverages trend-seasonal decomposition and a novel backcast-residual based adaptive decomposition to integrate MMVs for LTSF.
arXiv Detail & Related papers (2025-05-29T20:55:24Z) - Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines [5.543238821368548]
Time series often exhibit significant diversity in their temporal patterns across different time spans and domains.<n>Time Tracker achieves state-of-the-art performance in predicting accuracy, model generalization and adaptability.
arXiv Detail & Related papers (2025-05-21T06:18:41Z) - MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies.<n>We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis.<n> Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z) - MFF-FTNet: Multi-scale Feature Fusion across Frequency and Temporal Domains for Time Series Forecasting [18.815152183468673]
Time series forecasting is crucial in many fields, yet current deep learning models struggle with noise, data sparsity, and capturing complex patterns.
This paper presents MFF-FTNet, a novel framework addressing these challenges by combining contrastive learning with multi-scale feature extraction.
Extensive experiments on five real-world datasets demonstrate that MFF-FTNet significantly outperforms state-of-the-art models.
arXiv Detail & Related papers (2024-11-26T12:41:42Z) - UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens.
Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z) - Adaptive Convolutional Forecasting Network Based on Time Series Feature-Driven [9.133955922897371]
Time series data in real-world scenarios contain a substantial amount of nonlinear information.
We introduce multi-resolution convolution and deformable convolution operations.
We propose ACNet, an adaptive convolutional network designed to effectively model the local and global temporal dependencies.
arXiv Detail & Related papers (2024-05-20T14:05:35Z) - SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion [59.96233305733875]
Time series forecasting plays a crucial role in various fields such as finance, traffic management, energy, and healthcare.
Several methods utilize mechanisms like attention or mixer to address this by capturing channel correlations.
This paper presents an efficient-based model, the Series-cOre Fused Time Series forecaster (SOFTS)
arXiv Detail & Related papers (2024-04-22T14:06:35Z) - Multi-scale Attention Flow for Probabilistic Time Series Forecasting [68.20798558048678]
We propose a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF)
Our model avoids the influence of cumulative error and does not increase the time complexity.
Our model achieves state-of-the-art performance on many popular multivariate datasets.
arXiv Detail & Related papers (2022-05-16T07:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.