A Decade of Deep Learning for Remote Sensing Spatiotemporal Fusion: Advances, Challenges, and Opportunities
- URL: http://arxiv.org/abs/2504.00901v2
- Date: Fri, 11 Jul 2025 11:18:07 GMT
- Title: A Decade of Deep Learning for Remote Sensing Spatiotemporal Fusion: Advances, Challenges, and Opportunities
- Authors: Enzhe Sun, Yongchuan Cui, Peng Liu, Jining Yan,
- Abstract summary: This paper presents the first comprehensive survey of deep learning advances in remote sensing STF over the past decade.<n>We establish a taxonomy of deep learning architectures including CNNs, Transformers, Generative Adrial Networks (GANs), diffusion models, and sequence models.<n>We identify five critical challenges: time-space conflicts, generalization across datasets, computational efficiency for large-scale processing, multi-source heterogeneous fusion, and insufficient benchmark diversity.
- Score: 2.2311172523629637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remote sensing spatiotemporal fusion (STF) addresses the fundamental trade-off between temporal and spatial resolution by combining high temporal-low spatial and high spatial-low temporal imagery. This paper presents the first comprehensive survey of deep learning advances in remote sensing STF over the past decade. We establish a systematic taxonomy of deep learning architectures including Convolutional Neural Networks (CNNs), Transformers, Generative Adversarial Networks (GANs), diffusion models, and sequence models, revealing significant growth in deep learning adoption for STF tasks. Our analysis reveals that CNN-based methods dominate spatial feature extraction, while Transformer architectures show superior performance in capturing long-range temporal dependencies. GAN and diffusion models demonstrate exceptional capability in detail reconstruction, substantially outperforming traditional methods in structural similarity and spectral fidelity. Through comprehensive experiments on seven benchmark datasets comparing ten representative methods, we validate these findings and quantify the performance trade-offs between different approaches. We identify five critical challenges: time-space conflicts, limited generalization across datasets, computational efficiency for large-scale processing, multi-source heterogeneous fusion, and insufficient benchmark diversity. The survey highlights promising opportunities in foundation models, hybrid architectures, and self-supervised learning approaches that could address current limitations and enable multimodal applications. The specific models, datasets, and other information mentioned in this article have been collected in: https://github.com/yc-cui/Deep-Learning-Spatiotemporal-Fusion-Survey.
Related papers
- CAST: Cross-Attentive Spatio-Temporal feature fusion for Deepfake detection [0.0]
CNNs are effective at capturing spatial artifacts, and Transformers excel at modeling temporal inconsistencies.<n>We propose a unified CAST model that leverages cross-attention to effectively fuse spatial and temporal features.<n>We evaluate the performance of our model using the FaceForensics++, Celeb-DF, and DeepfakeDetection datasets.
arXiv Detail & Related papers (2025-06-26T18:51:17Z) - Multivariate Long-term Time Series Forecasting with Fourier Neural Filter [55.09326865401653]
We introduce FNF as the backbone and DBD as architecture to provide excellent learning capabilities and optimal learning pathways for spatial-temporal modeling.<n>We show that FNF unifies local time-domain and global frequency-domain information processing within a single backbone that extends naturally to spatial modeling.
arXiv Detail & Related papers (2025-06-10T18:40:20Z) - UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines [64.84631333071728]
We introduce bfUnistage, a unified Transformer-based framework fortemporal modeling.<n>Our work demonstrates that a task-specific vision-text can build a generalizable model fortemporal learning.<n>We also introduce a temporal module to incorporate temporal dynamics explicitly.
arXiv Detail & Related papers (2025-03-26T17:33:23Z) - Deep Learning for Spatio-Temporal Fusion in Land Surface Temperature Estimation: A Comprehensive Survey, Experimental Analysis, and Future Trends [3.344876133162209]
Land Surface Temperature (LST) plays a critical role in understanding key environmental processes.<n>Satellite sensors often face a trade-off between spatial and temporal resolutions.<n>Spatio-Temporal Fusion (STF) has emerged as a powerful method to integrate two satellite data sources.
arXiv Detail & Related papers (2024-12-21T13:53:15Z) - Higher-order Cross-structural Embedding Model for Time Series Analysis [12.35149125898563]
Time series analysis has gained significant attention due to its critical applications in diverse fields such as healthcare, finance, and sensor networks.
Current approaches struggle to model higher-order interactions within time series, and focus on learning temporal or spatial dependencies separately.
We propose Higher-order Cross-structural Embedding Model for Time Series (High-TS), a novel framework that jointly models both temporal and spatial perspectives.
arXiv Detail & Related papers (2024-10-30T12:51:14Z) - A Comprehensive Survey of Deep Learning for Time Series Forecasting: Architectural Diversity and Open Challenges [37.20655606514617]
Time series forecasting is a critical task that provides key information for decision-making.<n>Recent research has shown that alternatives such as simple linear layers can outperform Transformers.
arXiv Detail & Related papers (2024-10-24T07:43:55Z) - Foundation Models for Remote Sensing and Earth Observation: A Survey [101.77425018347557]
This survey systematically reviews the emerging field of Remote Sensing Foundation Models (RSFMs)
It begins with an outline of their motivation and background, followed by an introduction of their foundational concepts.
We benchmark these models against publicly available datasets, discuss existing challenges, and propose future research directions.
arXiv Detail & Related papers (2024-10-22T01:08:21Z) - State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era [59.279784235147254]
This survey provides an in-depth summary of the latest approaches that are based on recurrent models for sequential data processing.
The emerging picture suggests that there is room for thinking of novel routes, constituted by learning algorithms which depart from the standard Backpropagation Through Time.
arXiv Detail & Related papers (2024-06-13T12:51:22Z) - SFANet: Spatial-Frequency Attention Network for Weather Forecasting [54.470205739015434]
Weather forecasting plays a critical role in various sectors, driving decision-making and risk management.
Traditional methods often struggle to capture the complex dynamics of meteorological systems.
We propose a novel framework designed to address these challenges and enhance the accuracy of weather prediction.
arXiv Detail & Related papers (2024-05-29T08:00:15Z) - Time Travelling Pixels: Bitemporal Features Integration with Foundation
Model for Remote Sensing Image Change Detection [28.40070234949818]
Time Travelling Pixels (TTP) is a novel approach that integrates the latent knowledge foundation model into change detection.
The state-of-the-art results obtained on the LEVIR-CD underscore the efficacy of the TTP.
arXiv Detail & Related papers (2023-12-23T08:56:52Z) - GATGPT: A Pre-trained Large Language Model with Graph Attention Network
for Spatiotemporal Imputation [19.371155159744934]
In real-world settings, such data often contain missing elements due to issues like sensor malfunctions and data transmission errors.
The objective oftemporal imputation is to estimate these missing values by understanding the inherent spatial and temporal relationships in the observed time series.
Traditionally, intricatetemporal imputation has relied on specific architectures, which suffer from limited applicability and high computational complexity.
In contrast our approach integrates pre-trained large language models (LLMs) into intricatetemporal imputation, introducing a groundbreaking framework, GATGPT.
arXiv Detail & Related papers (2023-11-24T08:15:11Z) - Deep Learning for Spatiotemporal Big Data: A Vision on Opportunities and
Challenges [4.497634148674422]
Intemporal big data can foster new opportunities to solve problems that have not been possible before.
The distinctive characteristics of big data pose new challenges for deep learning technologies.
arXiv Detail & Related papers (2023-10-30T19:12:51Z) - Remote Sensing Object Detection Meets Deep Learning: A Meta-review of
Challenges and Advances [51.70835702029498]
This review aims to present a comprehensive review of the recent achievements in deep learning based RSOD methods.
We identify five main challenges in RSOD, including multi-scale object detection, rotated object detection, weak object detection, tiny object detection, and object detection with limited supervision.
We also review the widely used benchmark datasets and evaluation metrics within the field of RSOD, as well as the application scenarios for RSOD.
arXiv Detail & Related papers (2023-09-13T06:48:32Z) - Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep
Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data.
We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z) - Tensor Decompositions for Hyperspectral Data Processing in Remote
Sensing: A Comprehensive Review [85.36368666877412]
hyperspectral (HS) remote sensing (RS) imaging has provided a significant amount of spatial and spectral information for the observation and analysis of the Earth's surface.
The recent advancement and even revolution of the HS RS technique offer opportunities to realize the full potential of various applications.
Due to the maintenance of the 3-D HS inherent structure, tensor decomposition has aroused widespread concern and research in HS data processing tasks.
arXiv Detail & Related papers (2022-05-13T00:39:23Z) - Deep Learning in Multimodal Remote Sensing Data Fusion: A Comprehensive
Review [33.40031994803646]
This survey aims to present a systematic overview in DL-based multimodal RS data fusion.
Sub-fields in the multimodal RS data fusion are reviewed in terms of to-be-fused data modalities.
The remaining challenges and potential future directions are highlighted.
arXiv Detail & Related papers (2022-05-03T09:08:16Z) - Supporting Optimal Phase Space Reconstructions Using Neural Network
Architecture for Time Series Modeling [68.8204255655161]
We propose an artificial neural network with a mechanism to implicitly learn the phase spaces properties.
Our approach is either as competitive as or better than most state-of-the-art strategies.
arXiv Detail & Related papers (2020-06-19T21:04:47Z) - Spatiotemporal Fusion in 3D CNNs: A Probabilistic View [129.84064609199663]
We propose to convert success thetemporal fusion strategies into a probability, which allows us to perform network-level evaluations of various fusion strategies without having to train them separately.
Our approach greatly boosts the efficiency of analyzingtemporal fusion.
We generate new fusion strategies which achieve the state-of-the-art performance on four well-grained action recognition datasets.
arXiv Detail & Related papers (2020-04-10T10:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.