Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise
Semantic Labeling
- URL: http://arxiv.org/abs/2201.01953v2
- Date: Sun, 9 Jan 2022 05:23:03 GMT
- Title: Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise
Semantic Labeling
- Authors: Yang Long and Gui-Song Xia and Liangpei Zhang and Gong Cheng and Deren
Li
- Abstract summary: Given an aerial image, aerial scene parsing (ASP) targets to interpret the semantic structure of the image content by assigning a semantic label to every pixel of the image.
We present a large-scale scene classification dataset that contains one million aerial images termed Million-AID.
We also report benchmarking experiments using classical convolutional neural networks (CNNs) to achieve pixel-wise semantic labeling.
- Score: 48.30060717413166
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Given an aerial image, aerial scene parsing (ASP) targets to interpret the
semantic structure of the image content, e.g., by assigning a semantic label to
every pixel of the image. With the popularization of data-driven methods, the
past decades have witnessed promising progress on ASP by approaching the
problem with the schemes of tile-level scene classification or
segmentation-based image analysis, when using high-resolution aerial images.
However, the former scheme often produces results with tile-wise boundaries,
while the latter one needs to handle the complex modeling process from pixels
to semantics, which often requires large-scale and well-annotated image samples
with pixel-wise semantic labels. In this paper, we address these issues in ASP,
with perspectives from tile-level scene classification to pixel-wise semantic
labeling. Specifically, we first revisit aerial image interpretation by a
literature review. We then present a large-scale scene classification dataset
that contains one million aerial images termed Million-AID. With the presented
dataset, we also report benchmarking experiments using classical convolutional
neural networks (CNNs). Finally, we perform ASP by unifying the tile-level
scene classification and object-based image analysis to achieve pixel-wise
semantic labeling. Intensive experiments show that Million-AID is a challenging
yet useful dataset, which can serve as a benchmark for evaluating newly
developed algorithms. When transferring knowledge from Million-AID, fine-tuning
CNN models pretrained on Million-AID perform consistently better than those
pretrained ImageNet for aerial scene classification. Moreover, our designed
hierarchical multi-task learning method achieves the state-of-the-art
pixel-wise classification on the challenging GID, bridging the tile-level scene
classification toward pixel-wise semantic labeling for aerial image
interpretation.
Related papers
- Learning Semantic Segmentation with Query Points Supervision on Aerial Images [57.09251327650334]
We present a weakly supervised learning algorithm to train semantic segmentation algorithms.
Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation.
arXiv Detail & Related papers (2023-09-11T14:32:04Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Learning to Annotate Part Segmentation with Gradient Matching [58.100715754135685]
This paper focuses on tackling semi-supervised part segmentation tasks by generating high-quality images with a pre-trained GAN.
In particular, we formulate the annotator learning as a learning-to-learn problem.
We show that our method can learn annotators from a broad range of labelled images including real images, generated images, and even analytically rendered images.
arXiv Detail & Related papers (2022-11-06T01:29:22Z) - A Pixel-Level Meta-Learner for Weakly Supervised Few-Shot Semantic
Segmentation [40.27705176115985]
Few-shot semantic segmentation addresses the learning task in which only few images with ground truth pixel-level labels are available for the novel classes of interest.
We propose a novel meta-learning framework, which predicts pseudo pixel-level segmentation masks from a limited amount of data and their semantic labels.
Our proposed learning model can be viewed as a pixel-level meta-learner.
arXiv Detail & Related papers (2021-11-02T08:28:11Z) - Maximize the Exploration of Congeneric Semantics for Weakly Supervised
Semantic Segmentation [27.155133686127474]
We construct a graph neural network (P-GNN) based on the self-detected patches from different images that contain the same class labels.
We conduct experiments on the popular PASCAL VOC 2012 benchmarks, and our model yields state-of-the-art performance.
arXiv Detail & Related papers (2021-10-08T08:59:16Z) - Mixed Supervision Learning for Whole Slide Image Classification [88.31842052998319]
We propose a mixed supervision learning framework for super high-resolution images.
During the patch training stage, this framework can make use of coarse image-level labels to refine self-supervised learning.
A comprehensive strategy is proposed to suppress pixel-level false positives and false negatives.
arXiv Detail & Related papers (2021-07-02T09:46:06Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Deep Active Learning for Joint Classification & Segmentation with Weak
Annotator [22.271760669551817]
CNN visualization and interpretation methods, like class-activation maps (CAMs), are typically used to highlight the image regions linked to class predictions.
We propose an active learning framework, which progressively integrates pixel-level annotations during training.
Our results indicate that, by simply using random sample selection, the proposed approach can significantly outperform state-of-the-art CAMs and AL methods.
arXiv Detail & Related papers (2020-10-10T03:25:54Z) - Automatic Image Labelling at Pixel Level [21.59653873040243]
We propose an interesting learning approach to generate pixel-level image labellings automatically.
A Guided Filter Network (GFN) is first developed to learn the segmentation knowledge from a source domain.
GFN then transfers such segmentation knowledge to generate coarse object masks in the target domain.
arXiv Detail & Related papers (2020-07-15T00:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.