Related papers: Training and Inference within 1 Second -- Tackle Cross-Sensor Degradation of Real-World Pansharpening with Efficient Residual Feature Tailoring

Training and Inference within 1 Second -- Tackle Cross-Sensor Degradation of Real-World Pansharpening with Efficient Residual Feature Tailoring

URL: http://arxiv.org/abs/2508.07369v1
Date: Sun, 10 Aug 2025 14:39:18 GMT
Title: Training and Inference within 1 Second -- Tackle Cross-Sensor Degradation of Real-World Pansharpening with Efficient Residual Feature Tailoring
Authors: Tianyu Xin, Jin-Liang Xiao, Zeyu Xia, Shan Yin, Liang-Jian Deng,
Abstract summary: Existing methods to tackle cross-sensor degradation include retraining model or zero-shot methods.<n>Our method first performs modular decomposition on deep learning-based pansharpening models.<n>A Feature Tailor is then integrated at this interface to address cross-sensor degradation at the feature level.
Score: 7.471505633354803
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning methods for pansharpening have advanced rapidly, yet models pretrained on data from a specific sensor often generalize poorly to data from other sensors. Existing methods to tackle such cross-sensor degradation include retraining model or zero-shot methods, but they are highly time-consuming or even need extra training data. To address these challenges, our method first performs modular decomposition on deep learning-based pansharpening models, revealing a general yet critical interface where high-dimensional fused features begin mapping to the channel space of the final image. % may need revisement A Feature Tailor is then integrated at this interface to address cross-sensor degradation at the feature level, and is trained efficiently with physics-aware unsupervised losses. Moreover, our method operates in a patch-wise manner, training on partial patches and performing parallel inference on all patches to boost efficiency. Our method offers two key advantages: (1) $\textit{Improved Generalization Ability}$: it significantly enhance performance in cross-sensor cases. (2) $\textit{Low Generalization Cost}$: it achieves sub-second training and inference, requiring only partial test inputs and no external data, whereas prior methods often take minutes or even hours. Experiments on the real-world data from multiple datasets demonstrate that our method achieves state-of-the-art quality and efficiency in tackling cross-sensor degradation. For example, training and inference of $512\times512\times8$ image within $\textit{0.2 seconds}$ and $4000\times4000\times8$ image within $\textit{3 seconds}$ at the fastest setting on a commonly used RTX 3090 GPU, which is over 100 times faster than zero-shot methods.

Related papers

SWIFT: A General Sensitive Weight Identification Framework for Fast Sensor-Transfer Pansharpening [16.578857961692716]
Pansharpening aims to fuse high-resolution panchromatic (PAN) images with low-resolution multispectral (LRMS) images to generate high-resolution multispectral (HRMS) images.<n>Deep learning-based methods have achieved promising performance, but they generally suffer from severe performance degradation when applied to data from unseen sensors.<n>We propose a fast and general-purpose framework for cross-sensor adaptation, SWIFT.
arXiv Detail & Related papers (2025-07-27T15:06:05Z)
CAT: A Conditional Adaptation Tailor for Efficient and Effective Instance-Specific Pansharpening on Real-World Data [7.471505633354803]
We propose an efficient framework that adapts to a specific input instance, completing both training and inference in a short time.<n>Our method achieves state-of-the-art performance on cross-sensor real-world data, while achieving both training and inference of $512times512$ image within $textit0.4 seconds$.
arXiv Detail & Related papers (2025-04-14T14:04:55Z)
Bridging the Gap Between End-to-End and Two-Step Text Spotting [88.14552991115207]
Bridging Text Spotting is a novel approach that resolves the error accumulation and suboptimal performance issues in two-step methods. We demonstrate the effectiveness of the proposed method through extensive experiments.
arXiv Detail & Related papers (2024-04-06T13:14:04Z)
Practical cross-sensor color constancy using a dual-mapping strategy [0.0]
The proposed method uses a dual-mapping strategy and only requires a simple white point from a test sensor under a D65 condition. In the second mapping phase, we transform the re-constructed image data into sparse features, which are then optimized with a lightweight multi-layer perceptron (MLP) model. This approach effectively reduces sensor discrepancies and delivers performance on par with leading cross-sensor methods.
arXiv Detail & Related papers (2023-11-20T13:58:59Z)
Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution. Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x. We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z)
Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues. No human annotations are involved in our framework during the whole training process. Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z)
Sample and Computation Redistribution for Efficient Face Detection [137.19388513633484]
Training data sampling and computation distribution strategies are the keys to efficient and accurate face detection. scrfdf34 outperforms the best competitor, TinaFace, by $3.86%$ (AP at hard set) while being more than emph3$times$ faster on GPUs with VGA-resolution images.
arXiv Detail & Related papers (2021-05-10T23:51:14Z)
Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data [64.40187171234838]
Seasonal Contrast (SeCo) is an effective pipeline to leverage unlabeled data for in-domain pre-training of re-mote sensing representations. SeCo will be made public to facilitate transfer learning and enable rapid progress in re-mote sensing applications.
arXiv Detail & Related papers (2021-03-30T18:26:39Z)
Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy. But their inference time is typically slow, on the order of seconds for a pair of 540p images. We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z)
Deep Learning in Diabetic Foot Ulcers Detection: A Comprehensive Evaluation [14.227261503586599]
This paper summarises the results of DFUC 2020 by comparing the deep learning-based algorithms proposed by the winning teams. The best performance was obtained from Deformable Convolution, a variant of Faster R-CNN, with a mean average precision (mAP) of 0.6940 and an F1-Score of 0.7434.
arXiv Detail & Related papers (2020-10-07T11:31:27Z)
SADet: Learning An Efficient and Accurate Pedestrian Detector [68.66857832440897]
This paper proposes a series of systematic optimization strategies for the detection pipeline of one-stage detector. It forms a single shot anchor-based detector (SADet) for efficient and accurate pedestrian detection. Though structurally simple, it presents state-of-the-art result and real-time speed of $20$ FPS for VGA-resolution images.
arXiv Detail & Related papers (2020-07-26T12:32:38Z)
Syn2Real Transfer Learning for Image Deraining using Gaussian Processes [92.15895515035795]
CNN-based methods for image deraining have achieved excellent performance in terms of reconstruction error as well as visual quality. Due to challenges in obtaining real world fully-labeled image deraining datasets, existing methods are trained only on synthetically generated data. We propose a Gaussian Process-based semi-supervised learning framework which enables the network in learning to derain using synthetic dataset.
arXiv Detail & Related papers (2020-06-10T00:33:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.