PAN-Crafter: Learning Modality-Consistent Alignment for PAN-Sharpening
- URL: http://arxiv.org/abs/2505.23367v2
- Date: Tue, 15 Jul 2025 06:54:06 GMT
- Title: PAN-Crafter: Learning Modality-Consistent Alignment for PAN-Sharpening
- Authors: Jeonghyeok Do, Sungpyo Kim, Geunhyuk Youk, Jaehyup Lee, Munchurl Kim,
- Abstract summary: We propose PAN-Crafter, a modality-consistent alignment framework.<n>At its core, Modality-Adaptive Reconstruction (MARs) enables a single network to jointly reconstruct HRMS and PAN images.<n> experiments on multiple benchmark datasets demonstrate that our PAN-Crafter outperforms the most recent state-of-the-art method in all metrics.
- Score: 20.43260906326048
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: PAN-sharpening aims to fuse high-resolution panchromatic (PAN) images with low-resolution multi-spectral (MS) images to generate high-resolution multi-spectral (HRMS) outputs. However, cross-modality misalignment -- caused by sensor placement, acquisition timing, and resolution disparity -- induces a fundamental challenge. Conventional deep learning methods assume perfect pixel-wise alignment and rely on per-pixel reconstruction losses, leading to spectral distortion, double edges, and blurring when misalignment is present. To address this, we propose PAN-Crafter, a modality-consistent alignment framework that explicitly mitigates the misalignment gap between PAN and MS modalities. At its core, Modality-Adaptive Reconstruction (MARs) enables a single network to jointly reconstruct HRMS and PAN images, leveraging PAN's high-frequency details as auxiliary self-supervision. Additionally, we introduce Cross-Modality Alignment-Aware Attention (CM3A), a novel mechanism that bidirectionally aligns MS texture to PAN structure and vice versa, enabling adaptive feature refinement across modalities. Extensive experiments on multiple benchmark datasets demonstrate that our PAN-Crafter outperforms the most recent state-of-the-art method in all metrics, even with 50.11$\times$ faster inference time and 0.63$\times$ the memory size. Furthermore, it demonstrates strong generalization performance on unseen satellite datasets, showing its robustness across different conditions.
Related papers
- Rotation Equivariant Arbitrary-scale Image Super-Resolution [62.41329042683779]
The arbitrary-scale image super-resolution (ASISR) aims to achieve arbitrary-scale high-resolution recoveries from a low-resolution input image.<n>We make efforts to construct a rotation equivariant ASISR method in this study.
arXiv Detail & Related papers (2025-08-07T08:51:03Z) - AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection [58.67129770371016]
We propose a novel IRSTD framework that reimagines the IRSTD paradigm by incorporating textual metadata for scene-aware optimization.<n>AuxDet consistently outperforms state-of-the-art methods, validating the critical role of auxiliary information in improving robustness and accuracy.
arXiv Detail & Related papers (2025-05-21T07:02:05Z) - Feature Alignment with Equivariant Convolutions for Burst Image Super-Resolution [52.55429225242423]
We propose a novel framework for Burst Image Super-Resolution (BISR), featuring an equivariant convolution-based alignment.<n>This enables the alignment transformation to be learned via explicit supervision in the image domain and easily applied in the feature domain.<n>Experiments on BISR benchmarks show the superior performance of our approach in both quantitative metrics and visual quality.
arXiv Detail & Related papers (2025-03-11T11:13:10Z) - Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image [51.333064033152304]
Recently launched satellites can concurrently acquire HSIs and panchromatic (PAN) images.<n>Hipandas is a novel learning paradigm that reconstructs HRHS images from noisy low-resolution HSIs and high-resolution PAN images.
arXiv Detail & Related papers (2024-12-05T14:39:29Z) - Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches.
In this paper, we propose a model-based deep unfolded method for satellite image fusion.
Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2024-09-04T13:05:00Z) - MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo [8.303396507129266]
MSP-MVS is a method introducing multi-granularity segmentation prior to edge-confined patch deformation.<n>We implement equidistribution and disassemble-clustering of correlative reliable pixels.<n>We also introduce disparity-sampling synergistic 3D optimization to help identify global-minimum matching costs.
arXiv Detail & Related papers (2024-07-27T19:00:44Z) - CMT: Cross Modulation Transformer with Hybrid Loss for Pansharpening [14.459280238141849]
Pansharpening aims to enhance remote sensing image (RSI) quality by merging high-resolution panchromatic (PAN) with multispectral (MS) images.
Prior techniques struggled to optimally fuse PAN and MS images for enhanced spatial and spectral information.
We present the Cross Modulation Transformer (CMT), a pioneering method that modifies the attention mechanism.
arXiv Detail & Related papers (2024-04-01T13:55:44Z) - PC-GANs: Progressive Compensation Generative Adversarial Networks for
Pan-sharpening [50.943080184828524]
We propose a novel two-step model for pan-sharpening that sharpens the MS image through the progressive compensation of the spatial and spectral information.
The whole model is composed of triple GANs, and based on the specific architecture, a joint compensation loss function is designed to enable the triple GANs to be trained simultaneously.
arXiv Detail & Related papers (2022-07-29T03:09:21Z) - HyperTransformer: A Textural and Spectral Feature Fusion Transformer for
Pansharpening [60.89777029184023]
Pansharpening aims to fuse a registered high-resolution panchromatic image (PAN) with a low-resolution hyperspectral image (LR-HSI) to generate an enhanced HSI with high spectral and spatial resolution.
Existing pansharpening approaches neglect using an attention mechanism to transfer HR texture features from PAN to LR-HSI features, resulting in spatial and spectral distortions.
We present a novel attention mechanism for pansharpening called HyperTransformer, in which features of LR-HSI and PAN are formulated as queries and keys in a transformer, respectively.
arXiv Detail & Related papers (2022-03-04T18:59:08Z) - SIPSA-Net: Shift-Invariant Pan Sharpening with Moving Object Alignment
for Satellite Imagery [36.24121979886052]
Pan-sharpening is a process of merging a high-resolution (HR) panchromatic (PAN) image and its corresponding low-resolution (LR) multi-spectral (MS) image to create an HR-MS and pan-sharpened image.
Due to the different sensors' locations, characteristics and acquisition time, PAN and MS image pairs often tend to have various amounts of misalignment.
We propose shift-invariant pan-sharpening with moving object alignment (SIPSA-Net) which is the first method to take into account such large misalignment of moving object regions for PAN sharpening.
arXiv Detail & Related papers (2021-05-06T02:27:50Z) - Fast and High-Quality Blind Multi-Spectral Image Pansharpening [48.68143888901669]
We propose a fast approach to blind pansharpening and achieve state-of-the-art image reconstruction quality.
To achieve fast blind pansharpening, we decouple the solution of the blur kernel and of the HRMS image.
Our algorithm outperforms state-of-the-art model-based counterparts in terms of both computational time and reconstruction quality.
arXiv Detail & Related papers (2021-03-17T23:12:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.