Related papers: MGTS-Net: Exploring Graph-Enhanced Multimodal Fusion for Augmented Time Series Forecasting

MGTS-Net: Exploring Graph-Enhanced Multimodal Fusion for Augmented Time Series Forecasting

URL: http://arxiv.org/abs/2510.16350v1
Date: Sat, 18 Oct 2025 04:47:10 GMT
Title: MGTS-Net: Exploring Graph-Enhanced Multimodal Fusion for Augmented Time Series Forecasting
Authors: Shule Hao, Junpeng Bao, Wenli Li,
Abstract summary: We propose MGTS-Net, a Multimodal Graph-enhanced Network for Time Series forecasting.<n>The model consists of three core components: (1) a Multimodal Feature Extraction layer (MFE), (2) a Multimodal Feature Fusion layer (MFF), and (3) a Multi-Scale Prediction layer (MSP)
Score: 1.7077661158850292
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent research in time series forecasting has explored integrating multimodal features into models to improve accuracy. However, the accuracy of such methods is constrained by three key challenges: inadequate extraction of fine-grained temporal patterns, suboptimal integration of multimodal information, and limited adaptability to dynamic multi-scale features. To address these problems, we propose MGTS-Net, a Multimodal Graph-enhanced Network for Time Series forecasting. The model consists of three core components: (1) a Multimodal Feature Extraction layer (MFE), which optimizes feature encoders according to the characteristics of temporal, visual, and textual modalities to extract temporal features of fine-grained patterns; (2) a Multimodal Feature Fusion layer (MFF), which constructs a heterogeneous graph to model intra-modal temporal dependencies and cross-modal alignment relationships and dynamically aggregates multimodal knowledge; (3) a Multi-Scale Prediction layer (MSP), which adapts to multi-scale features by dynamically weighting and fusing the outputs of short-term, medium-term, and long-term predictors. Extensive experiments demonstrate that MGTS-Net exhibits excellent performance with light weight and high efficiency. Compared with other state-of-the-art baseline models, our method achieves superior performance, validating the superiority of the proposed methodology.

Related papers

Graph Neural Networks with Diversity-aware Neighbor Selection and Dynamic Multi-scale Fusion for Multivariate Time Series Forecasting [2.861817098638611]
We propose a Graph Neural Networks (GNNs) with Diversity-aware Neighbor Selection and Dynamic Multi-scale Fusion (DIMIGNN)<n>DIMIGNN introduces a Diversity-aware Neighbor Selection Mechanism (DNSM) to ensure that each variable shares high informational similarity with its neighbors.<n>Experiments on real-world datasets demonstrate that DIMIGNN consistently outperforms prior methods.
arXiv Detail & Related papers (2025-09-28T06:23:43Z)
SDGF: Fusing Static and Multi-Scale Dynamic Correlations for Multivariate Time Series Forecasting [9.027814258970684]
Inter-series correlations are crucial for accurate time series forecasting.<n>These relationships often exhibit complex dynamics across different temporal scales.<n>Existing methods are limited in modeling these multi-scale dependencies.
arXiv Detail & Related papers (2025-09-14T11:23:12Z)
MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings [75.0617088717528]
MoCa is a framework for transforming pre-trained VLM backbones into effective bidirectional embedding models.<n>MoCa consistently improves performance across MMEB and ViDoRe-v2 benchmarks, achieving new state-of-the-art results.
arXiv Detail & Related papers (2025-06-29T06:41:00Z)
MFF-FTNet: Multi-scale Feature Fusion across Frequency and Temporal Domains for Time Series Forecasting [18.815152183468673]
Time series forecasting is crucial in many fields, yet current deep learning models struggle with noise, data sparsity, and capturing complex patterns. This paper presents MFF-FTNet, a novel framework addressing these challenges by combining contrastive learning with multi-scale feature extraction. Extensive experiments on five real-world datasets demonstrate that MFF-FTNet significantly outperforms state-of-the-art models.
arXiv Detail & Related papers (2024-11-26T12:41:42Z)
MGCP: A Multi-Grained Correlation based Prediction Network for Multivariate Time Series [54.91026286579748]
We propose a Multi-Grained Correlations-based Prediction Network. It simultaneously considers correlations at three levels to enhance prediction performance. It employs adversarial training with an attention mechanism-based predictor and conditional discriminator to optimize prediction results at coarse-grained level.
arXiv Detail & Related papers (2024-05-30T03:32:44Z)
Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion [18.138433117711177]
We propose a novel multimodal hybrid tracker (MMHT) that utilizes frame-event-based data for reliable single object tracking. The MMHT model employs a hybrid backbone consisting of an artificial neural network (ANN) and a spiking neural network (SNN) to extract dominant features from different visual modalities. Extensive experiments demonstrate that the MMHT model exhibits competitive performance in comparison with other state-of-the-art methods.
arXiv Detail & Related papers (2024-05-28T07:24:56Z)
U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics. We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features. Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
IMKGA-SM: Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling [3.867363075280544]
Multimodal knowledge graph link prediction aims to improve the accuracy and efficiency of link prediction tasks for multimodal data. New model is developed, namely Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling (IMKGA-SM) Model achieves much better performance than SOTA baselines on multimodal link prediction datasets of different sizes.
arXiv Detail & Related papers (2023-01-06T10:08:11Z)
Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision. This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z)
Multivariate Time Series Forecasting with Dynamic Graph Neural ODEs [65.18780403244178]
We propose a continuous model to forecast Multivariate Time series with dynamic Graph neural Ordinary Differential Equations (MTGODE) Specifically, we first abstract multivariate time series into dynamic graphs with time-evolving node features and unknown graph structures. Then, we design and solve a neural ODE to complement missing graph topologies and unify both spatial and temporal message passing.
arXiv Detail & Related papers (2022-02-17T02:17:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.