MultiScene: A Large-scale Dataset and Benchmark for Multi-scene
Recognition in Single Aerial Images
- URL: http://arxiv.org/abs/2104.02846v1
- Date: Wed, 7 Apr 2021 01:09:12 GMT
- Title: MultiScene: A Large-scale Dataset and Benchmark for Multi-scene
Recognition in Single Aerial Images
- Authors: Yuansheng Hua, Lichao Mou, Pu Jin, Xiao Xiang Zhu
- Abstract summary: We create a large-scale dataset, called MultiScene, composed of 100,000 high-resolution aerial images.
We visually inspect 14,000 images and correct their scene labels, yielding a subset of cleanly-annotated images, named MultiScene-Clean.
We conduct experiments with extensive baseline models on both MultiScene-Clean and MultiScene to offer benchmarks for multi-scene recognition in single images.
- Score: 17.797726722637634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aerial scene recognition is a fundamental research problem in interpreting
high-resolution aerial imagery. Over the past few years, most studies focus on
classifying an image into one scene category, while in real-world scenarios, it
is more often that a single image contains multiple scenes. Therefore, in this
paper, we investigate a more practical yet underexplored task -- multi-scene
recognition in single images. To this end, we create a large-scale dataset,
called MultiScene, composed of 100,000 unconstrained high-resolution aerial
images. Considering that manually labeling such images is extremely arduous, we
resort to low-cost annotations from crowdsourcing platforms, e.g.,
OpenStreetMap (OSM). However, OSM data might suffer from incompleteness and
incorrectness, which introduce noise into image labels. To address this issue,
we visually inspect 14,000 images and correct their scene labels, yielding a
subset of cleanly-annotated images, named MultiScene-Clean. With it, we can
develop and evaluate deep networks for multi-scene recognition using clean
data. Moreover, we provide crowdsourced annotations of all images for the
purpose of studying network learning with noisy labels. We conduct experiments
with extensive baseline models on both MultiScene-Clean and MultiScene to offer
benchmarks for multi-scene recognition in single images and learning from noisy
labels for this task, respectively. To facilitate progress, we will make our
dataset and pre-trained models available.
Related papers
- Perceptual Artifacts Localization for Image Synthesis Tasks [59.638307505334076]
We introduce a novel dataset comprising 10,168 generated images, each annotated with per-pixel perceptual artifact labels.
A segmentation model, trained on our proposed dataset, effectively localizes artifacts across a range of tasks.
We propose an innovative zoom-in inpainting pipeline that seamlessly rectifies perceptual artifacts in the generated images.
arXiv Detail & Related papers (2023-10-09T10:22:08Z) - Saliency Guided Contrastive Learning on Scene Images [71.07412958621052]
We leverage the saliency map derived from the model's output during learning to highlight discriminative regions and guide the whole contrastive learning.
Our method significantly improves the performance of self-supervised learning on scene images by +1.1, +4.3, +2.2 Top1 accuracy in ImageNet linear evaluation, Semi-supervised learning with 1% and 10% ImageNet labels, respectively.
arXiv Detail & Related papers (2023-02-22T15:54:07Z) - Self-attention on Multi-Shifted Windows for Scene Segmentation [14.47974086177051]
We explore the effective use of self-attention within multi-scale image windows to learn descriptive visual features.
We propose three different strategies to aggregate these feature maps to decode the feature representation for dense prediction.
Our models achieve very promising performance on four public scene segmentation datasets.
arXiv Detail & Related papers (2022-07-10T07:36:36Z) - Facing the Void: Overcoming Missing Data in Multi-View Imagery [0.783788180051711]
We propose a novel technique for multi-view image classification robust to this problem.
The proposed method, based on state-of-the-art deep learning-based approaches and metric learning, can be easily adapted and exploited in other applications and domains.
Results show that the proposed algorithm provides improvements in multi-view image classification accuracy when compared to state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T13:21:27Z) - Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise
Semantic Labeling [48.30060717413166]
Given an aerial image, aerial scene parsing (ASP) targets to interpret the semantic structure of the image content by assigning a semantic label to every pixel of the image.
We present a large-scale scene classification dataset that contains one million aerial images termed Million-AID.
We also report benchmarking experiments using classical convolutional neural networks (CNNs) to achieve pixel-wise semantic labeling.
arXiv Detail & Related papers (2022-01-06T07:40:47Z) - Region-level Active Learning for Cluttered Scenes [60.93811392293329]
We introduce a new strategy that subsumes previous Image-level and Object-level approaches into a generalized, Region-level approach.
We show that this approach significantly decreases labeling effort and improves rare object search on realistic data with inherent class-imbalance and cluttered scenes.
arXiv Detail & Related papers (2021-08-20T14:02:38Z) - Semantic Diversity Learning for Zero-Shot Multi-label Classification [14.480713752871523]
This study introduces an end-to-end model training for multi-label zero-shot learning.
We propose to use an embedding matrix having principal embedding vectors trained using a tailored loss function.
In addition, during training, we suggest up-weighting in the loss function image samples presenting higher semantic diversity to encourage the diversity of the embedding matrix.
arXiv Detail & Related papers (2021-05-12T19:39:07Z) - Aerial Scene Understanding in The Wild: Multi-Scene Recognition via
Prototype-based Memory Networks [14.218223473363276]
We propose a prototype-based memory network to recognize multiple scenes in a single image.
The proposed network consists of three key components: 1) a prototype learning module, 2) a prototype-inhabiting external memory, and 3) a multi-head attention-based memory retrieval module.
To facilitate the progress of aerial scene recognition, we produce a new multi-scene aerial image (MAI) dataset.
arXiv Detail & Related papers (2021-04-22T17:32:14Z) - Free-Form Image Inpainting via Contrastive Attention Network [64.05544199212831]
In image inpainting tasks, masks with any shapes can appear anywhere in images which form complex patterns.
It is difficult for encoders to capture such powerful representations under this complex situation.
We propose a self-supervised Siamese inference network to improve the robustness and generalization.
arXiv Detail & Related papers (2020-10-29T14:46:05Z) - Attention-Aware Noisy Label Learning for Image Classification [97.26664962498887]
Deep convolutional neural networks (CNNs) learned on large-scale labeled samples have achieved remarkable progress in computer vision.
The cheapest way to obtain a large body of labeled visual data is to crawl from websites with user-supplied labels, such as Flickr.
This paper proposes the attention-aware noisy label learning approach to improve the discriminative capability of the network trained on datasets with potential label noise.
arXiv Detail & Related papers (2020-09-30T15:45:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.