Universal Pansharpening Foundation Model
- URL: http://arxiv.org/abs/2603.03831v1
- Date: Wed, 04 Mar 2026 08:30:15 GMT
- Title: Universal Pansharpening Foundation Model
- Authors: Hebaixu Wang, Jing Zhang, Haonan Guo, Di Wang, Jiayi Ma, Bo Du, Liangpei Zhang,
- Abstract summary: Pansharpening generates the high-resolution multi-spectral (MS) image by integrating spatial details from a texture-rich panchromatic (PAN) image and spectral attributes from a low-resolution MS image.<n>We present FoundPS, a universal pansharpening foundation model for satellite-agnostic and scene-robust fusion.
- Score: 67.10467574892282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pansharpening generates the high-resolution multi-spectral (MS) image by integrating spatial details from a texture-rich panchromatic (PAN) image and spectral attributes from a low-resolution MS image. Existing methods are predominantly satellite-specific and scene-dependent, which severely limits their generalization across heterogeneous sensors and varied scenes, thereby reducing their real-world practicality. To address these challenges, we present FoundPS, a universal pansharpening foundation model for satellite-agnostic and scene-robust fusion. Specifically, we introduce a modality-interleaved transformer that learns band-wise modal specializations to form reversible spectral affine bases, mapping arbitrary-band MS into a unified latent space via tensor multiplication. Building upon this, we construct a latent diffusion bridge model to progressively evolve latent representations, and incorporate bridge posterior sampling to couple latent diffusion with pixel-space observations, enabling stable and controllable fusion. Furthermore, we devise infinite-dimensional pixel-to-latent interaction mechanisms to comprehensively capture the cross-domain dependencies between PAN observations and MS representations, thereby facilitating complementary information fusion. In addition, to support large-scale training and evaluation, we construct a comprehensive pansharpening benchmark, termed PSBench, consisting of worldwide MS and PAN image pairs from multiple satellites across diverse scenes. Extensive experiments demonstrate that FoundPS consistently outperforms state-of-the-art methods, exhibiting superior generalization and robustness across a wide range of pansharpening tasks.
Related papers
- SALAD-Pan: Sensor-Agnostic Latent Adaptive Diffusion for Pan-Sharpening [50.44337053599724]
SALAD-Pan is a sensor-agnostic latent space diffusion method for efficient pansharpening.<n>It trains a band-wise single-channel VAE to encode high-resolution multispectral images into compact latent representations.<n>It achieves high-precision fusion in the diffusion process and robust zero-shot (cross-sensor) capability.
arXiv Detail & Related papers (2026-02-04T12:01:07Z) - Hyperspectral Image Fusion with Spectral-Band and Fusion-Scale Agnosticism [42.31159916095528]
Current deep learning models for Multispectral and Hyperspectral Image Fusion (MS/HS fusion) are typically designed for fixed spectral bands and spatial scales.<n>We propose SSA, a universal framework for MS/HS fusion with spectral-band and fusion-scale agnosticism.<n>Our single model achieves state-of-the-art performance while generalizing well to unseen sensors and scales.
arXiv Detail & Related papers (2026-02-02T05:48:53Z) - PAN-Crafter: Learning Modality-Consistent Alignment for PAN-Sharpening [20.43260906326048]
We propose PAN-Crafter, a modality-consistent alignment framework.<n>At its core, Modality-Adaptive Reconstruction (MARs) enables a single network to jointly reconstruct HRMS and PAN images.<n> experiments on multiple benchmark datasets demonstrate that our PAN-Crafter outperforms the most recent state-of-the-art method in all metrics.
arXiv Detail & Related papers (2025-05-29T11:46:21Z) - A Fusion-Guided Inception Network for Hyperspectral Image Super-Resolution [4.487807378174191]
We propose a single-image super-resolution model called the Fusion-Guided Inception Network (FGIN)<n>Specifically, we first employ a spectral-spatial fusion module to effectively integrate spectral and spatial information.<n>An Inception-like hierarchical feature extraction strategy is used to capture multiscale spatial dependencies.<n>To further enhance reconstruction quality, we incorporate an optimized upsampling module that combines bilinear with depthwise separable convolutions.
arXiv Detail & Related papers (2025-05-06T11:15:59Z) - Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches.
In this paper, we propose a model-based deep unfolded method for satellite image fusion.
Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2024-09-04T13:05:00Z) - HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model [88.13261547704444]
Hyper SIGMA is a vision transformer-based foundation model that unifies HSI interpretation across tasks and scenes.<n>In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images.
arXiv Detail & Related papers (2024-06-17T13:22:58Z) - A Dual Domain Multi-exposure Image Fusion Network based on the
Spatial-Frequency Integration [57.14745782076976]
Multi-exposure image fusion aims to generate a single high-dynamic image by integrating images with different exposures.
We propose a novelty perspective on multi-exposure image fusion via the Spatial-Frequency Integration Framework, named MEF-SFI.
Our method achieves visual-appealing fusion results against state-of-the-art multi-exposure image fusion approaches.
arXiv Detail & Related papers (2023-12-17T04:45:15Z) - Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z) - PC-GANs: Progressive Compensation Generative Adversarial Networks for
Pan-sharpening [50.943080184828524]
We propose a novel two-step model for pan-sharpening that sharpens the MS image through the progressive compensation of the spatial and spectral information.
The whole model is composed of triple GANs, and based on the specific architecture, a joint compensation loss function is designed to enable the triple GANs to be trained simultaneously.
arXiv Detail & Related papers (2022-07-29T03:09:21Z) - Decoupled-and-Coupled Networks: Self-Supervised Hyperspectral Image
Super-Resolution with Subpixel Fusion [67.35540259040806]
We propose a subpixel-level HS super-resolution framework by devising a novel decoupled-and-coupled network, called DCNet.
As the name suggests, DC-Net first decouples the input into common (or cross-sensor) and sensor-specific components.
We append a self-supervised learning module behind the CSU net by guaranteeing the material consistency to enhance the detailed appearances of the restored HS product.
arXiv Detail & Related papers (2022-05-07T23:40:36Z) - Unsupervised Pansharpening Based on Self-Attention Mechanism [12.995590360954957]
We propose an unsupervised pansharpening (UP) method in a deep-learning framework to address the challenges based on the self-attention mechanism (SAM)
The proposed approach is able to reconstruct sharper MSI of different types, with more details and less spectral distortion as compared to the state-of-the-art.
arXiv Detail & Related papers (2020-06-16T16:46:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.