SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion
Constraint for Remote Sensing Image
- URL: http://arxiv.org/abs/2401.05093v1
- Date: Wed, 10 Jan 2024 11:55:58 GMT
- Title: SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion
Constraint for Remote Sensing Image
- Authors: Jiayuan Tian, Jie Lei, Jiaqing Zhang, Weiying Xie, Yunsong Li
- Abstract summary: SwiMDiff is a novel self-supervised pre-training framework for remote sensing images.
It recalibrates labels to recognize data from the same scene as false negatives.
It seamlessly integrates contrastive learning (CL) with a diffusion model.
- Score: 21.596874679058327
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With recent advancements in aerospace technology, the volume of unlabeled
remote sensing image (RSI) data has increased dramatically. Effectively
leveraging this data through self-supervised learning (SSL) is vital in the
field of remote sensing. However, current methodologies, particularly
contrastive learning (CL), a leading SSL method, encounter specific challenges
in this domain. Firstly, CL often mistakenly identifies geographically adjacent
samples with similar semantic content as negative pairs, leading to confusion
during model training. Secondly, as an instance-level discriminative task, it
tends to neglect the essential fine-grained features and complex details
inherent in unstructured RSIs. To overcome these obstacles, we introduce
SwiMDiff, a novel self-supervised pre-training framework designed for RSIs.
SwiMDiff employs a scene-wide matching approach that effectively recalibrates
labels to recognize data from the same scene as false negatives. This
adjustment makes CL more applicable to the nuances of remote sensing.
Additionally, SwiMDiff seamlessly integrates CL with a diffusion model. Through
the implementation of pixel-level diffusion constraints, we enhance the
encoder's ability to capture both the global semantic information and the
fine-grained features of the images more comprehensively. Our proposed
framework significantly enriches the information available for downstream tasks
in remote sensing. Demonstrating exceptional performance in change detection
and land-cover classification tasks, SwiMDiff proves its substantial utility
and value in the field of remote sensing.
Related papers
- GLRT-Based Metric Learning for Remote Sensing Object Retrieval [19.210692452537007]
Existing CBRSOR methods neglect the utilization of global statistical information during both training and test stages.
Inspired by the Neyman-Pearson theorem, we propose a generalized likelihood ratio test-based metric learning (GLRTML) approach.
arXiv Detail & Related papers (2024-10-08T07:53:30Z) - IRASNet: Improved Feature-Level Clutter Reduction for Domain Generalized SAR-ATR [11.197991954581155]
This study proposes a framework particularly designed for domain-generalized SAR-ATR called IRASNet.
IRASNet enables effective feature-level clutter reduction and domain-invariant feature learning.
IRASNet not only enhances performance but also significantly improves feature-level clutter reduction, making it a valuable advancement in the field of radar image pattern recognition.
arXiv Detail & Related papers (2024-09-25T11:53:58Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - L-DAWA: Layer-wise Divergence Aware Weight Aggregation in Federated
Self-Supervised Visual Representation Learning [14.888569402903562]
Integration of self-supervised learning (SSL) and federated learning (FL) into one coherent system can potentially offer data privacy guarantees.
We propose a new aggregation strategy termed Layer-wise Divergence Aware Weight Aggregation (L-DAWA) to mitigate the influence of client bias and divergence during FL aggregation.
arXiv Detail & Related papers (2023-07-14T15:07:30Z) - A generic self-supervised learning (SSL) framework for representation
learning from spectra-spatial feature of unlabeled remote sensing imagery [4.397725469518669]
Self-supervised learning (SSL) enables the models to learn a representation from orders of magnitude more unlabelled data.
This work has designed a novel SSL framework that is capable of learning representation from both spectra-spatial information of unlabelled data.
arXiv Detail & Related papers (2023-06-27T23:50:43Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - Robust Semi-supervised Federated Learning for Images Automatic
Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition.
There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules.
We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Dense Label Encoding for Boundary Discontinuity Free Rotation Detection [69.75559390700887]
This paper explores a relatively less-studied methodology based on classification.
We propose new techniques to push its frontier in two aspects.
Experiments and visual analysis on large-scale public datasets for aerial images show the effectiveness of our approach.
arXiv Detail & Related papers (2020-11-19T05:42:02Z) - Remote Sensing Image Scene Classification with Self-Supervised Paradigm
under Limited Labeled Samples [11.025191332244919]
We introduce new self-supervised learning (SSL) mechanism to obtain the high-performance pre-training model for RSIs scene classification from large unlabeled data.
Experiments on three commonly used RSIs scene classification datasets demonstrated that this new learning paradigm outperforms the traditional dominant ImageNet pre-trained model.
The insights distilled from our studies can help to foster the development of SSL in the remote sensing community.
arXiv Detail & Related papers (2020-10-02T09:27:19Z) - X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for
Classification of Remote Sensing Data [69.37597254841052]
We propose a novel cross-modal deep-learning framework called X-ModalNet.
X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network.
We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
arXiv Detail & Related papers (2020-06-24T15:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.