OpticalRS-4M: Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset
- URL: http://arxiv.org/abs/2406.11933v3
- Date: Fri, 30 Aug 2024 15:08:13 GMT
- Title: OpticalRS-4M: Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset
- Authors: Fengxiang Wang, Hongzhen Wang, Di Wang, Zonghao Guo, Zhenyu Zhong, Long Lan, Jing Zhang, Zhiyuan Liu, Maosong Sun,
- Abstract summary: We present a new pre-training pipeline for RS models, featuring the creation of a large-scale RS dataset and an efficient MIM approach.
We curated a high-quality dataset named OpticalRS-4M by collecting publicly available RS datasets and processing them through exclusion, slicing, and deduplication.
Experiments demonstrate that OpticalRS-4M significantly improves classification, detection, and segmentation performance, while SelectiveMAE increases training efficiency over 2 times.
- Score: 66.15872913664407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked Image Modeling (MIM) has become an essential method for building foundational visual models in remote sensing (RS). However, the limitations in size and diversity of existing RS datasets restrict the ability of MIM methods to learn generalizable representations. Additionally, conventional MIM techniques, which require reconstructing all tokens, introduce unnecessary computational overhead. To address these issues, we present a new pre-training pipeline for RS models, featuring the creation of a large-scale RS dataset and an efficient MIM approach. We curated a high-quality dataset named OpticalRS-4M by collecting publicly available RS datasets and processing them through exclusion, slicing, and deduplication. OpticalRS-4M comprises 4 million optical images covering various RS tasks, such as object detection and pixel segmentation. To enhance efficiency, we propose SelectiveMAE, a pre-training method that dynamically encodes and reconstructs semantically rich patch tokens, thereby reducing the inefficiencies of traditional MIM models caused by redundant background pixels in RS images. Extensive experiments demonstrate that OpticalRS-4M significantly improves classification, detection, and segmentation performance, while SelectiveMAE increases training efficiency over 2 times. This highlights the effectiveness and scalability of our pipeline in developing RS foundational models.
Related papers
- Learning De-Biased Representations for Remote-Sensing Imagery [27.218010145001312]
Remote sensing (RS) imagery requires specialized satellites to collect and be difficult to annotate.
Due to data scarcity, training any large-scale RS models from scratch is unrealistic.
We propose debLoRA, a generic training approach that works with any LoRA variants to yield debiased features.
arXiv Detail & Related papers (2024-10-06T16:47:30Z) - A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder [26.81539884309151]
Remote sensing (RS) data provide Earth observations across multiple dimensions, encompassing critical spatial, temporal, and spectral information.
Despite various pre-training methods tailored to the characteristics of RS data, a key limitation persists: the inability to effectively integrate spatial, temporal, and spectral information within a single unified model.
We propose an Anchor-Aware Masked AutoEncoder method (A$2$-MAE), leveraging intrinsic complementary information from the different kinds of images and geo-information to reconstruct the masked patches during the pre-training phase.
arXiv Detail & Related papers (2024-06-12T11:02:15Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - On quantifying and improving realism of images generated with diffusion [50.37578424163951]
We propose a metric, called Image Realism Score (IRS), computed from five statistical measures of a given image.
IRS is easily usable as a measure to classify a given image as real or fake.
We experimentally establish the model- and data-agnostic nature of the proposed IRS by successfully detecting fake images generated by Stable Diffusion Model (SDM), Dalle2, Midjourney and BigGAN.
Our efforts have also led to Gen-100 dataset, which provides 1,000 samples for 100 classes generated by four high-quality models.
arXiv Detail & Related papers (2023-09-26T08:32:55Z) - RingMo-lite: A Remote Sensing Multi-task Lightweight Network with
CNN-Transformer Hybrid Framework [15.273362355253779]
This paper proposes RingMo-lite, an RS multi-task lightweight network with a CNN-Transformer hybrid framework to optimize the interpretation process.
The proposed RingMo-lite reduces the parameters over 60% in various RS image interpretation tasks, the average accuracy drops by less than 2% in most of the scenes and achieves SOTA performance compared to models of the similar size.
arXiv Detail & Related papers (2023-09-16T14:15:59Z) - Improving Pixel-based MIM by Reducing Wasted Modeling Capability [77.99468514275185]
We propose a new method that explicitly utilizes low-level features from shallow layers to aid pixel reconstruction.
To the best of our knowledge, we are the first to systematically investigate multi-level feature fusion for isotropic architectures.
Our method yields significant performance gains, such as 1.2% on fine-tuning, 2.8% on linear probing, and 2.6% on semantic segmentation.
arXiv Detail & Related papers (2023-08-01T03:44:56Z) - PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling [83.67628239775878]
Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT.
This paper undertakes a fundamental analysis of MIM from the perspective of pixel reconstruction.
We propose a remarkably simple and effective method, ourmethod, that entails two strategies.
arXiv Detail & Related papers (2023-03-04T13:38:51Z) - Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote
Sensing Image Retrieval [21.05804942940532]
Cross-modal text-image retrieval has attracted extensive attention for its advantages of flexible input and efficient query.
To cope with the problem of multi-scale scarcity and target redundancy in RS multimodal retrieval task, we come up with a novel asymmetric multimodal feature matching network (AMFMN)
Our model adapts to multi-scale feature inputs, favors multi-source retrieval methods, and can dynamically filter redundant features.
arXiv Detail & Related papers (2022-04-21T03:53:19Z) - DML-GANR: Deep Metric Learning With Generative Adversarial Network
Regularization for High Spatial Resolution Remote Sensing Image Retrieval [9.423185775609426]
We develop a deep metric learning approach with generative adversarial network regularization (DML-GANR) for HSR-RSI retrieval.
The experimental results on the three data sets demonstrate the superior performance of DML-GANR over state-of-the-art techniques in HSR-RSI retrieval.
arXiv Detail & Related papers (2020-10-07T02:26:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.