Semantic Labeling of Large-Area Geographic Regions Using Multi-View and
Multi-Date Satellite Images and Noisy OSM Training Labels
- URL: http://arxiv.org/abs/2008.10271v5
- Date: Sun, 27 Jun 2021 02:50:21 GMT
- Title: Semantic Labeling of Large-Area Geographic Regions Using Multi-View and
Multi-Date Satellite Images and Noisy OSM Training Labels
- Authors: Bharath Comandur and Avinash C. Kak
- Abstract summary: We present a novel multi-view training framework and CNN architecture for semantically label buildings and roads.
Our approach to multi-view semantic segmentation yields a 4-7% improvement in the per-class IoU scores compared to the traditional approaches.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel multi-view training framework and CNN architecture for
combining information from multiple overlapping satellite images and noisy
training labels derived from OpenStreetMap (OSM) to semantically label
buildings and roads across large geographic regions (100 km$^2$). Our approach
to multi-view semantic segmentation yields a 4-7% improvement in the per-class
IoU scores compared to the traditional approaches that use the views
independently of one another. A unique (and, perhaps, surprising) property of
our system is that modifications that are added to the tail-end of the CNN for
learning from the multi-view data can be discarded at the time of inference
with a relatively small penalty in the overall performance. This implies that
the benefits of training using multiple views are absorbed by all the layers of
the network. Additionally, our approach only adds a small overhead in terms of
the GPU-memory consumption even when training with as many as 32 views per
scene. The system we present is end-to-end automated, which facilitates
comparing the classifiers trained directly on true orthophotos vis-a-vis first
training them on the off-nadir images and subsequently translating the
predicted labels to geographical coordinates. With no human supervision, our
IoU scores for the buildings and roads classes are 0.8 and 0.64 respectively
which are better than state-of-the-art approaches that use OSM labels and that
are not completely automated.
Related papers
- MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning [9.540487697801531]
MMEarth is a diverse multi-modal pretraining dataset at global scale.
We propose a Multi-Pretext Masked Autoencoder (MP-MAE) approach to learn general-purpose representations for optical satellite images.
arXiv Detail & Related papers (2024-05-04T23:16:48Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - Enhancing Self-Supervised Learning for Remote Sensing with Elevation
Data: A Case Study with Scarce And High Level Semantic Labels [1.534667887016089]
This work proposes a hybrid unsupervised and supervised learning method to pre-train models applied in Earth observation downstream tasks.
We combine a contrastive approach to pre-train models with a pixel-wise regression pre-text task to predict coarse elevation maps.
arXiv Detail & Related papers (2023-04-13T23:01:11Z) - UniFormer: Unifying Convolution and Self-attention for Visual
Recognition [69.68907941116127]
Convolution neural networks (CNNs) and vision transformers (ViTs) have been two dominant frameworks in the past few years.
We propose a novel Unified transFormer (UniFormer) which seamlessly integrates the merits of convolution and self-attention in a concise transformer format.
Our UniFormer achieves 86.3 top-1 accuracy on ImageNet-1K classification.
arXiv Detail & Related papers (2022-01-24T04:39:39Z) - Seed the Views: Hierarchical Semantic Alignment for Contrastive
Representation Learning [116.91819311885166]
We propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to textbfCross-samples and Multi-level representation.
Our method, termed as CsMl, has the ability to integrate multi-level visual representations across samples in a robust way.
arXiv Detail & Related papers (2020-12-04T17:26:24Z) - Semantics through Time: Semi-supervised Segmentation of Aerial Videos
with Iterative Label Propagation [16.478668565965243]
This paper makes an important step towards automatic annotation by introducing SegProp.
SegProp is a novel iterative flow-based method, with a direct connection to spectral clustering in space and time.
We introduce Ruralscapes, a new dataset with high resolution (4K) images and manually-annotated dense labels every 50 frames.
Our novel SegProp automatically annotates the remaining unlabeled 98% of frames with an accuracy exceeding 90%.
arXiv Detail & Related papers (2020-10-02T15:15:50Z) - Knowledge-Guided Multi-Label Few-Shot Learning for General Image
Recognition [75.44233392355711]
KGGR framework exploits prior knowledge of statistical label correlations with deep neural networks.
It first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence.
Then, it introduces the label semantics to guide learning semantic-specific features.
It exploits a graph propagation network to explore graph node interactions.
arXiv Detail & Related papers (2020-09-20T15:05:29Z) - Efficient Full Image Interactive Segmentation by Leveraging Within-image
Appearance Similarity [39.17599924322882]
We propose a new approach to interactive full-image semantic segmentation.
We leverage a key observation: propagation from labeled to unlabeled pixels does not necessarily require class-specific knowledge.
We build on this observation and propose an approach capable of jointly propagating pixel labels from multiple classes.
arXiv Detail & Related papers (2020-07-16T08:21:59Z) - SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled.
A self-supervised task from representation learning is employed to obtain semantically meaningful features.
We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z) - On the Texture Bias for Few-Shot CNN Segmentation [21.349705243254423]
Convolutional Neural Networks (CNNs) are driven by shapes to perform visual recognition tasks.
Recent evidence suggests texture bias in CNNs provides higher performing models when learning on large labeled training datasets.
We propose a novel architecture that integrates a set of Difference of Gaussians (DoG) to attenuate high-frequency local components in the feature space.
arXiv Detail & Related papers (2020-03-09T11:55:47Z) - RGB-based Semantic Segmentation Using Self-Supervised Depth Pre-Training [77.62171090230986]
We propose an easily scalable and self-supervised technique that can be used to pre-train any semantic RGB segmentation method.
In particular, our pre-training approach makes use of automatically generated labels that can be obtained using depth sensors.
We show how our proposed self-supervised pre-training with HN-labels can be used to replace ImageNet pre-training.
arXiv Detail & Related papers (2020-02-06T11:16:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.