Related papers: SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image

SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image

URL: http://arxiv.org/abs/2401.05093v1
Date: Wed, 10 Jan 2024 11:55:58 GMT
Title: SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image
Authors: Jiayuan Tian, Jie Lei, Jiaqing Zhang, Weiying Xie, Yunsong Li
Abstract summary: SwiMDiff is a novel self-supervised pre-training framework for remote sensing images. It recalibrates labels to recognize data from the same scene as false negatives. It seamlessly integrates contrastive learning (CL) with a diffusion model.
Score: 21.596874679058327
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With recent advancements in aerospace technology, the volume of unlabeled remote sensing image (RSI) data has increased dramatically. Effectively leveraging this data through self-supervised learning (SSL) is vital in the field of remote sensing. However, current methodologies, particularly contrastive learning (CL), a leading SSL method, encounter specific challenges in this domain. Firstly, CL often mistakenly identifies geographically adjacent samples with similar semantic content as negative pairs, leading to confusion during model training. Secondly, as an instance-level discriminative task, it tends to neglect the essential fine-grained features and complex details inherent in unstructured RSIs. To overcome these obstacles, we introduce SwiMDiff, a novel self-supervised pre-training framework designed for RSIs. SwiMDiff employs a scene-wide matching approach that effectively recalibrates labels to recognize data from the same scene as false negatives. This adjustment makes CL more applicable to the nuances of remote sensing. Additionally, SwiMDiff seamlessly integrates CL with a diffusion model. Through the implementation of pixel-level diffusion constraints, we enhance the encoder's ability to capture both the global semantic information and the fine-grained features of the images more comprehensively. Our proposed framework significantly enriches the information available for downstream tasks in remote sensing. Demonstrating exceptional performance in change detection and land-cover classification tasks, SwiMDiff proves its substantial utility and value in the field of remote sensing.

Related papers

ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation [23.40908829241552]
We propose a novel task called zero-shot remote sensing instance segmentation, aimed at identifying aerial objects that are absent from training data. We introduce a knowledge-injected adaptation strategy that decouples semantic-related information to preserve the pretrained vision-language alignment. We establish new experimental protocols and benchmarks, and extensive experiments convincingly demonstrate that ZoRI achieves the state-of-art performance.
arXiv Detail & Related papers (2024-12-17T11:00:56Z)
GLRT-Based Metric Learning for Remote Sensing Object Retrieval [19.210692452537007]
Existing CBRSOR methods neglect the utilization of global statistical information during both training and test stages. Inspired by the Neyman-Pearson theorem, we propose a generalized likelihood ratio test-based metric learning (GLRTML) approach.
arXiv Detail & Related papers (2024-10-08T07:53:30Z)
IRASNet: Improved Feature-Level Clutter Reduction for Domain Generalized SAR-ATR [8.857297839399193]
This study proposes a framework particularly designed for domain-generalized SAR-ATR called IRASNet. IRASNet enables effective feature-level clutter reduction and domain-invariant feature learning. IRASNet not only enhances performance but also significantly improves feature-level clutter reduction, making it a valuable advancement in the field of radar image pattern recognition.
arXiv Detail & Related papers (2024-09-25T11:53:58Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
L-DAWA: Layer-wise Divergence Aware Weight Aggregation in Federated Self-Supervised Visual Representation Learning [14.888569402903562]
Integration of self-supervised learning (SSL) and federated learning (FL) into one coherent system can potentially offer data privacy guarantees. We propose a new aggregation strategy termed Layer-wise Divergence Aware Weight Aggregation (L-DAWA) to mitigate the influence of client bias and divergence during FL aggregation.
arXiv Detail & Related papers (2023-07-14T15:07:30Z)
A generic self-supervised learning (SSL) framework for representation learning from spectra-spatial feature of unlabeled remote sensing imagery [4.397725469518669]
Self-supervised learning (SSL) enables the models to learn a representation from orders of magnitude more unlabelled data. This work has designed a novel SSL framework that is capable of learning representation from both spectra-spatial information of unlabelled data.
arXiv Detail & Related papers (2023-06-27T23:50:43Z)
De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects. We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding. We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z)
Robust Semi-supervised Federated Learning for Images Automatic Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition. There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules. We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z)
Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains. We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z)
Dense Label Encoding for Boundary Discontinuity Free Rotation Detection [69.75559390700887]
This paper explores a relatively less-studied methodology based on classification. We propose new techniques to push its frontier in two aspects. Experiments and visual analysis on large-scale public datasets for aerial images show the effectiveness of our approach.
arXiv Detail & Related papers (2020-11-19T05:42:02Z)
Remote Sensing Image Scene Classification with Self-Supervised Paradigm under Limited Labeled Samples [11.025191332244919]
We introduce new self-supervised learning (SSL) mechanism to obtain the high-performance pre-training model for RSIs scene classification from large unlabeled data. Experiments on three commonly used RSIs scene classification datasets demonstrated that this new learning paradigm outperforms the traditional dominant ImageNet pre-trained model. The insights distilled from our studies can help to foster the development of SSL in the remote sensing community.
arXiv Detail & Related papers (2020-10-02T09:27:19Z)
X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data [69.37597254841052]
We propose a novel cross-modal deep-learning framework called X-ModalNet. X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network. We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
arXiv Detail & Related papers (2020-06-24T15:29:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.