Training and Inference within 1 Second -- Tackle Cross-Sensor Degradation of Real-World Pansharpening with Efficient Residual Feature Tailoring
- URL: http://arxiv.org/abs/2508.07369v1
- Date: Sun, 10 Aug 2025 14:39:18 GMT
- Title: Training and Inference within 1 Second -- Tackle Cross-Sensor Degradation of Real-World Pansharpening with Efficient Residual Feature Tailoring
- Authors: Tianyu Xin, Jin-Liang Xiao, Zeyu Xia, Shan Yin, Liang-Jian Deng,
- Abstract summary: Existing methods to tackle cross-sensor degradation include retraining model or zero-shot methods.<n>Our method first performs modular decomposition on deep learning-based pansharpening models.<n>A Feature Tailor is then integrated at this interface to address cross-sensor degradation at the feature level.
- Score: 7.471505633354803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning methods for pansharpening have advanced rapidly, yet models pretrained on data from a specific sensor often generalize poorly to data from other sensors. Existing methods to tackle such cross-sensor degradation include retraining model or zero-shot methods, but they are highly time-consuming or even need extra training data. To address these challenges, our method first performs modular decomposition on deep learning-based pansharpening models, revealing a general yet critical interface where high-dimensional fused features begin mapping to the channel space of the final image. % may need revisement A Feature Tailor is then integrated at this interface to address cross-sensor degradation at the feature level, and is trained efficiently with physics-aware unsupervised losses. Moreover, our method operates in a patch-wise manner, training on partial patches and performing parallel inference on all patches to boost efficiency. Our method offers two key advantages: (1) $\textit{Improved Generalization Ability}$: it significantly enhance performance in cross-sensor cases. (2) $\textit{Low Generalization Cost}$: it achieves sub-second training and inference, requiring only partial test inputs and no external data, whereas prior methods often take minutes or even hours. Experiments on the real-world data from multiple datasets demonstrate that our method achieves state-of-the-art quality and efficiency in tackling cross-sensor degradation. For example, training and inference of $512\times512\times8$ image within $\textit{0.2 seconds}$ and $4000\times4000\times8$ image within $\textit{3 seconds}$ at the fastest setting on a commonly used RTX 3090 GPU, which is over 100 times faster than zero-shot methods.
Related papers
- SWIFT: A General Sensitive Weight Identification Framework for Fast Sensor-Transfer Pansharpening [16.578857961692716]
Pansharpening aims to fuse high-resolution panchromatic (PAN) images with low-resolution multispectral (LRMS) images to generate high-resolution multispectral (HRMS) images.<n>Deep learning-based methods have achieved promising performance, but they generally suffer from severe performance degradation when applied to data from unseen sensors.<n>We propose a fast and general-purpose framework for cross-sensor adaptation, SWIFT.
arXiv Detail & Related papers (2025-07-27T15:06:05Z) - CAT: A Conditional Adaptation Tailor for Efficient and Effective Instance-Specific Pansharpening on Real-World Data [7.471505633354803]
We propose an efficient framework that adapts to a specific input instance, completing both training and inference in a short time.<n>Our method achieves state-of-the-art performance on cross-sensor real-world data, while achieving both training and inference of $512times512$ image within $textit0.4 seconds$.
arXiv Detail & Related papers (2025-04-14T14:04:55Z) - Bridging the Gap Between End-to-End and Two-Step Text Spotting [88.14552991115207]
Bridging Text Spotting is a novel approach that resolves the error accumulation and suboptimal performance issues in two-step methods.
We demonstrate the effectiveness of the proposed method through extensive experiments.
arXiv Detail & Related papers (2024-04-06T13:14:04Z) - Practical cross-sensor color constancy using a dual-mapping strategy [0.0]
The proposed method uses a dual-mapping strategy and only requires a simple white point from a test sensor under a D65 condition.
In the second mapping phase, we transform the re-constructed image data into sparse features, which are then optimized with a lightweight multi-layer perceptron (MLP) model.
This approach effectively reduces sensor discrepancies and delivers performance on par with leading cross-sensor methods.
arXiv Detail & Related papers (2023-11-20T13:58:59Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z) - Sample and Computation Redistribution for Efficient Face Detection [137.19388513633484]
Training data sampling and computation distribution strategies are the keys to efficient and accurate face detection.
scrfdf34 outperforms the best competitor, TinaFace, by $3.86%$ (AP at hard set) while being more than emph3$times$ faster on GPUs with VGA-resolution images.
arXiv Detail & Related papers (2021-05-10T23:51:14Z) - Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote
Sensing Data [64.40187171234838]
Seasonal Contrast (SeCo) is an effective pipeline to leverage unlabeled data for in-domain pre-training of re-mote sensing representations.
SeCo will be made public to facilitate transfer learning and enable rapid progress in re-mote sensing applications.
arXiv Detail & Related papers (2021-03-30T18:26:39Z) - Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy.
But their inference time is typically slow, on the order of seconds for a pair of 540p images.
We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z) - Deep Learning in Diabetic Foot Ulcers Detection: A Comprehensive
Evaluation [14.227261503586599]
This paper summarises the results of DFUC 2020 by comparing the deep learning-based algorithms proposed by the winning teams.
The best performance was obtained from Deformable Convolution, a variant of Faster R-CNN, with a mean average precision (mAP) of 0.6940 and an F1-Score of 0.7434.
arXiv Detail & Related papers (2020-10-07T11:31:27Z) - SADet: Learning An Efficient and Accurate Pedestrian Detector [68.66857832440897]
This paper proposes a series of systematic optimization strategies for the detection pipeline of one-stage detector.
It forms a single shot anchor-based detector (SADet) for efficient and accurate pedestrian detection.
Though structurally simple, it presents state-of-the-art result and real-time speed of $20$ FPS for VGA-resolution images.
arXiv Detail & Related papers (2020-07-26T12:32:38Z) - Syn2Real Transfer Learning for Image Deraining using Gaussian Processes [92.15895515035795]
CNN-based methods for image deraining have achieved excellent performance in terms of reconstruction error as well as visual quality.
Due to challenges in obtaining real world fully-labeled image deraining datasets, existing methods are trained only on synthetically generated data.
We propose a Gaussian Process-based semi-supervised learning framework which enables the network in learning to derain using synthetic dataset.
arXiv Detail & Related papers (2020-06-10T00:33:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.