Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial
Representation Learning
- URL: http://arxiv.org/abs/2212.14532v4
- Date: Fri, 22 Sep 2023 02:34:12 GMT
- Title: Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial
Representation Learning
- Authors: Colorado J. Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher
Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, Trevor
Darrell
- Abstract summary: We present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales.
We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery.
- Score: 55.762840052788945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large, pretrained models are commonly finetuned with imagery that is heavily
augmented to mimic different conditions and scales, with the resulting models
used for various tasks with imagery from a range of spatial scales. Such models
overlook scale-specific information in the data for scale-dependent domains,
such as remote sensing. In this paper, we present Scale-MAE, a pretraining
method that explicitly learns relationships between data at different, known
scales throughout the pretraining process. Scale-MAE pretrains a network by
masking an input image at a known input scale, where the area of the Earth
covered by the image determines the scale of the ViT positional encoding, not
the image resolution. Scale-MAE encodes the masked image with a standard ViT
backbone, and then decodes the masked image through a bandpass filter to
reconstruct low/high frequency images at lower/higher scales. We find that
tasking the network with reconstructing both low/high frequency images leads to
robust multiscale representations for remote sensing imagery. Scale-MAE
achieves an average of a $2.4 - 5.6\%$ non-parametric kNN classification
improvement across eight remote sensing datasets compared to current
state-of-the-art and obtains a $0.9$ mIoU to $1.7$ mIoU improvement on the
SpaceNet building segmentation transfer task for a range of evaluation scales.
Related papers
- Multi-scale Unified Network for Image Classification [33.560003528712414]
CNNs face notable challenges in performance and computational efficiency when dealing with real-world, multi-scale image inputs.
We propose Multi-scale Unified Network (MUSN) consisting of multi-scales, a unified network, and scale-invariant constraint.
MUSN yields an accuracy increase up to 44.53% and diminishes FLOPs by 7.01-16.13% in multi-scale scenarios.
arXiv Detail & Related papers (2024-03-27T06:40:26Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - DeepMerge: Deep-Learning-Based Region-Merging for Image Segmentation [7.063322114865965]
We propose a deep-learning-based region merging method dubbed DeepMerge to handle the segmentation of complete objects in large VHR images.
This is the first method to use deep learning to learn the similarity and merge similar adjacent super-pixels in RAG.
DeepMerge achieves the highest F value (0.9550) and the lowest total error TE (0.0895), correctly segmenting objects of different sizes and outperforming all competing segmentation methods.
arXiv Detail & Related papers (2023-05-31T12:27:58Z) - M$^{2}$SNet: Multi-scale in Multi-scale Subtraction Network for Medical
Image Segmentation [73.10707675345253]
We propose a general multi-scale in multi-scale subtraction network (M$2$SNet) to finish diverse segmentation from medical image.
Our method performs favorably against most state-of-the-art methods under different evaluation metrics on eleven datasets of four different medical image segmentation tasks.
arXiv Detail & Related papers (2023-03-20T06:26:49Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - Bidirectional Multi-scale Attention Networks for Semantic Segmentation
of Oblique UAV Imagery [30.524771772192757]
We propose the novel bidirectional multi-scale attention networks, which fuse features from multiple scales bidirectionally for more adaptive and effective feature extraction.
Our model achieved the state-of-the-art (SOTA) result with a mean intersection over union (mIoU) score of 70.80%.
arXiv Detail & Related papers (2021-02-05T11:02:15Z) - Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting.
HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information.
Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.