M3S-Net: Multimodal Feature Fusion Network Based on Multi-scale Data for Ultra-short-term PV Power Forecasting
- URL: http://arxiv.org/abs/2602.19832v1
- Date: Mon, 23 Feb 2026 13:30:59 GMT
- Title: M3S-Net: Multimodal Feature Fusion Network Based on Multi-scale Data for Ultra-short-term PV Power Forecasting
- Authors: Penghui Niu, Taotao Cai, Suqi Zhang, Junhua Gu, Ping Zhang, Qiqi Liu, Jianxin Li,
- Abstract summary: Intermittency and high-frequency variability of solar radiance present significant challenges to high-penetration grids.<n>Existing architectures predominantly rely on shallow feature concatenation and binary cloud segmentation.<n>This paper proposes M3S-Net, a novel multimodal fusion network based on multi-scale data for ultra-short-term PV power forecasting.
- Score: 13.79706446185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The inherent intermittency and high-frequency variability of solar irradiance, particularly during rapid cloud advection, present significant stability challenges to high-penetration photovoltaic grids. Although multimodal forecasting has emerged as a viable mitigation strategy, existing architectures predominantly rely on shallow feature concatenation and binary cloud segmentation, thereby failing to capture the fine-grained optical features of clouds and the complex spatiotemporal coupling between visual and meteorological modalities. To bridge this gap, this paper proposes M3S-Net, a novel multimodal feature fusion network based on multi-scale data for ultra-short-term PV power forecasting. First, a multi-scale partial channel selection network leverages partial convolutions to explicitly isolate the boundary features of optically thin clouds, effectively transcending the precision limitations of coarse-grained binary masking. Second, a multi-scale sequence to image analysis network employs Fast Fourier Transform (FFT)-based time-frequency representation to disentangle the complex periodicity of meteorological data across varying time horizons. Crucially, the model incorporates a cross-modal Mamba interaction module featuring a novel dynamic C-matrix swapping mechanism. By exchanging state-space parameters between visual and temporal streams, this design conditions the state evolution of one modality on the context of the other, enabling deep structural coupling with linear computational complexity, thus overcoming the limitations of shallow concatenation. Experimental validation on the newly constructed fine-grained PV power dataset demonstrates that M3S-Net achieves a mean absolute error reduction of 6.2% in 10-minute forecasts compared to state-of-the-art baselines. The dataset and source code will be available at https://github.com/she1110/FGPD.
Related papers
- DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters [50.43534351968113]
Existing generative time series models do not address the multi-dimensional properties of time series data well.<n>Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS)
arXiv Detail & Related papers (2026-02-06T10:48:13Z) - MFC-RFNet: A Multi-scale Guided Rectified Flow Network for Radar Sequence Prediction [7.015114232190396]
Accurate high-resolution precipitation nowcasting from radar echo sequences is crucial for disaster mitigation and economic planning.<n>Key difficulties include modeling complex multi-scale evolution, inter-frame feature misalignment caused by displacement, and efficiently capturing long-range context.<n>We present the Multi-scale Feature Communication Rectified Flow Network (MFRF-Net), a generative framework that integrates multi-scale communication with guided feature fusion.
arXiv Detail & Related papers (2026-01-07T06:24:26Z) - USF-Net: A Unified Spatiotemporal Fusion Network for Ground-Based Remote Sensing Cloud Image Sequence Extrapolation [7.868367798549883]
Ground-based remote sensing cloud image sequence extrapolation is a key research area in the development of photovoltaic power systems.<n>We propose USF-Net, a Unified Stemporal Fusion that integrates adaptive large- kernel convolutions and a low-complexity attention mechanism.<n>As a key contribution, we also introduce and release the ASI-CIS dataset.
arXiv Detail & Related papers (2025-11-12T06:54:40Z) - Reservoir Computing via Multi-Scale Random Fourier Features for Forecasting Fast-Slow Dynamical Systems [0.0]
We present a novel reservoir computing framework that combines delay embedding with random Fourier feature (RFF) mappings to capture such dynamics.<n>Two formulations are investigated: a single-scale RFF reservoir, which employs a fixed kernel bandwidth, and a multi-scale RFF reservoir, which integrates multiple bandwidths to represent both fast and slow temporal dependencies.
arXiv Detail & Related papers (2025-11-04T08:01:08Z) - MGTS-Net: Exploring Graph-Enhanced Multimodal Fusion for Augmented Time Series Forecasting [1.7077661158850292]
We propose MGTS-Net, a Multimodal Graph-enhanced Network for Time Series forecasting.<n>The model consists of three core components: (1) a Multimodal Feature Extraction layer (MFE), (2) a Multimodal Feature Fusion layer (MFF), and (3) a Multi-Scale Prediction layer (MSP)
arXiv Detail & Related papers (2025-10-18T04:47:10Z) - Adaptive Fuzzy Time Series Forecasting via Partially Asymmetric Convolution and Sub-Sliding Window Fusion [0.0]
We propose a novel convolutional architecture with partially asymmetric design based on the time of sliding window.<n>The proposed method achieves state-of-the-art results on most of popular time series datasets.
arXiv Detail & Related papers (2025-07-28T08:58:25Z) - Mutual Information-driven Triple Interaction Network for Efficient Image
Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing.
The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal.
The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z) - Long-term Wind Power Forecasting with Hierarchical Spatial-Temporal
Transformer [112.12271800369741]
Wind power is attracting increasing attention around the world due to its renewable, pollution-free, and other advantages.
Accurate wind power forecasting (WPF) can effectively reduce power fluctuations in power system operations.
Existing methods are mainly designed for short-term predictions and lack effective spatial-temporal feature augmentation.
arXiv Detail & Related papers (2023-05-30T04:03:15Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision.
This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.