Cross-Scale Pansharpening via ScaleFormer and the PanScale Benchmark
- URL: http://arxiv.org/abs/2603.00543v1
- Date: Sat, 28 Feb 2026 08:44:34 GMT
- Title: Cross-Scale Pansharpening via ScaleFormer and the PanScale Benchmark
- Authors: Ke Cao, Xuanhua He, Xueheng Li, Lingting Zhu, Yingying Wang, Ao Ma, Zhanjie Zhang, Man Zhou, Chengjun Xie, Jie Zhang,
- Abstract summary: Pansharpening aims to generate high-resolution multi-spectral images by fusing the spatial detail of panchromatic images with the spectral richness of low-resolution MS data.<n>Existing methods are evaluated under limited, low-resolution settings, limiting their generalization to real-world, high-resolution scenarios.<n>We introduce PanScale, the first large-scale, cross-scale pansharpening dataset, accompanied by PanScale-Bench, a benchmark for evaluating generalization across varying resolutions and scales.
- Score: 39.78977567741962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pansharpening aims to generate high-resolution multi-spectral images by fusing the spatial detail of panchromatic images with the spectral richness of low-resolution MS data. However, most existing methods are evaluated under limited, low-resolution settings, limiting their generalization to real-world, high-resolution scenarios. To bridge this gap, we systematically investigate the data, algorithmic, and computational challenges of cross-scale pansharpening. We first introduce PanScale, the first large-scale, cross-scale pansharpening dataset, accompanied by PanScale-Bench, a comprehensive benchmark for evaluating generalization across varying resolutions and scales. To realize scale generalization, we propose ScaleFormer, a novel architecture designed for multi-scale pansharpening. ScaleFormer reframes generalization across image resolutions as generalization across sequence lengths: it tokenizes images into patch sequences of the same resolution but variable length proportional to image scale. A Scale-Aware Patchify module enables training for such variations from fixed-size crops. ScaleFormer then decouples intra-patch spatial feature learning from inter-patch sequential dependency modeling, incorporating Rotary Positional Encoding to enhance extrapolation to unseen scales. Extensive experiments show that our approach outperforms SOTA methods in fusion quality and cross-scale generalization. The datasets and source code are available upon acceptance.
Related papers
- Universal Pansharpening Foundation Model [67.10467574892282]
Pansharpening generates the high-resolution multi-spectral (MS) image by integrating spatial details from a texture-rich panchromatic (PAN) image and spectral attributes from a low-resolution MS image.<n>We present FoundPS, a universal pansharpening foundation model for satellite-agnostic and scene-robust fusion.
arXiv Detail & Related papers (2026-03-04T08:30:15Z) - Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches.
In this paper, we propose a model-based deep unfolded method for satellite image fusion.
Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2024-09-04T13:05:00Z) - MROVSeg: Breaking the Resolution Curse of Vision-Language Models in Open-Vocabulary Image Segmentation [26.667974865352708]
MROVSeg is a multi-resolution training framework for open-vocabulary image segmentation with a single pretrained CLIP backbone.<n>It uses sliding windows to slice the high-resolution input into uniform patches, each matching the input size of the well-trained image encoder.
arXiv Detail & Related papers (2024-08-27T04:45:53Z) - Learning Images Across Scales Using Adversarial Training [64.59447233902735]
We devise a novel paradigm for learning a representation that captures an orders-of-magnitude variety of scales from an unstructured collection of ordinary images.
We show that our generator can be used as a multiscale generative model, and for reconstructions of scale spaces from unstructured patches.
arXiv Detail & Related papers (2024-06-13T08:44:12Z) - HeightFormer: A Multilevel Interaction and Image-adaptive
Classification-regression Network for Monocular Height Estimation with Aerial
Images [10.716933766055755]
This paper presents a comprehensive solution for monocular height estimation in remote sensing.
It features the Multilevel Interaction Backbone (MIB) and Image-adaptive Classification-regression Height Generator (ICG)
The ICG dynamically generates height partition for each image and reframes the traditional regression task.
arXiv Detail & Related papers (2023-10-12T02:49:00Z) - Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial
Representation Learning [55.762840052788945]
We present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales.
We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery.
arXiv Detail & Related papers (2022-12-30T03:15:34Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - Learning deep multiresolution representations for pansharpening [4.469255274378329]
This paper proposes a pyramid based deep fusion framework that preserves spectral and spatial characteristics at different scales.
Experiments suggest that the proposed architecture outperforms state of the art pansharpening models.
arXiv Detail & Related papers (2021-02-16T19:41:57Z) - Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting.
HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information.
Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.