NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in
Aerial Images
- URL: http://arxiv.org/abs/2401.10530v1
- Date: Fri, 19 Jan 2024 07:12:36 GMT
- Title: NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in
Aerial Images
- Authors: Junyu Gao, Liangliang Zhao, and Xuelong Li
- Abstract summary: This paper introduces a Multi-category Object Counting task to estimate the numbers of different objects in an aerial image.
Considering the absence of a dataset for this task, a large-scale dataset is collected, consisting of 3,416 scenes with a resolution of 1024 $times$ 1024 pixels.
The paper presents a multi-spectrum, multi-category object counting framework, which employs a dual-attention module to fuse the features of RGB and NIR.
- Score: 64.92809155168595
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Object counting is a hot topic in computer vision, which aims to estimate the
number of objects in a given image. However, most methods only count objects of
a single category for an image, which cannot be applied to scenes that need to
count objects with multiple categories simultaneously, especially in aerial
scenes. To this end, this paper introduces a Multi-category Object Counting
(MOC) task to estimate the numbers of different objects (cars, buildings,
ships, etc.) in an aerial image. Considering the absence of a dataset for this
task, a large-scale Dataset (NWPU-MOC) is collected, consisting of 3,416 scenes
with a resolution of 1024 $\times$ 1024 pixels, and well-annotated using 14
fine-grained object categories. Besides, each scene contains RGB and Near
Infrared (NIR) images, of which the NIR spectrum can provide richer
characterization information compared with only the RGB spectrum. Based on
NWPU-MOC, the paper presents a multi-spectrum, multi-category object counting
framework, which employs a dual-attention module to fuse the features of RGB
and NIR and subsequently regress multi-channel density maps corresponding to
each object category. In addition, to modeling the dependency between different
channels in the density map with each object category, a spatial contrast loss
is designed as a penalty for overlapping predictions at the same spatial
position. Experimental results demonstrate that the proposed method achieves
state-of-the-art performance compared with some mainstream counting algorithms.
The dataset, code and models are publicly available at
https://github.com/lyongo/NWPU-MOC.
Related papers
- MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation [23.615122326731115]
We propose a novel solution that makes use of RGB video streams.
Our framework consists of three modules: a scale-aware monocular dense SLAM solution, a lightweight object pose predictor, and an object-level pose graph.
Our experimental results demonstrate that when utilizing public dataset sequences with high-quality depth information, the proposed method exhibits comparable performance to state-of-the-art RGB-D methods.
arXiv Detail & Related papers (2023-08-17T08:29:54Z) - Occlusion-Aware Instance Segmentation via BiLayer Network Architectures [73.45922226843435]
We propose Bilayer Convolutional Network (BCNet), where the top layer detects occluding objects (occluders) and the bottom layer infers partially occluded instances (occludees)
We investigate the efficacy of bilayer structure using two popular convolutional network designs, namely, Fully Convolutional Network (FCN) and Graph Convolutional Network (GCN)
arXiv Detail & Related papers (2022-08-08T21:39:26Z) - Scale Invariant Semantic Segmentation with RGB-D Fusion [12.650574326251023]
We propose a neural network architecture for scale-invariant semantic segmentation using RGB-D images.
We incorporate depth information to the RGB data for pixel-wise semantic segmentation to address the different scale objects in an outdoor scene.
Our model is compact and can be easily applied to the other RGB model.
arXiv Detail & Related papers (2022-04-10T12:54:27Z) - You Better Look Twice: a new perspective for designing accurate
detectors with reduced computations [56.34005280792013]
BLT-net is a new low-computation two-stage object detection architecture.
It reduces computations by separating objects from background using a very lite first-stage.
Resulting image proposals are then processed in the second-stage by a highly accurate model.
arXiv Detail & Related papers (2021-07-21T12:39:51Z) - Single Object Tracking through a Fast and Effective Single-Multiple
Model Convolutional Neural Network [0.0]
Recent state-of-the-art (SOTA) approaches are proposed based on taking a matching network with a heavy structure to distinguish the target from other objects in the area.
In this article, a special architecture is proposed based on which in contrast to the previous approaches, it is possible to identify the object location in a single shot.
The presented tracker performs comparatively with the SOTA in challenging situations while having a super speed compared to them (up to $120 FPS$ on 1080ti)
arXiv Detail & Related papers (2021-03-28T11:02:14Z) - Dilated-Scale-Aware Attention ConvNet For Multi-Class Object Counting [18.733301622920102]
Multi-class object counting expands the scope of application of object counting task.
The multi-target detection task can achieve multi-class object counting in some scenarios.
We propose a simple yet efficient counting network based on point-level annotations.
arXiv Detail & Related papers (2020-12-15T08:38:28Z) - Wide-Area Crowd Counting: Multi-View Fusion Networks for Counting in
Large Scenes [50.744452135300115]
We propose a deep neural network framework for multi-view crowd counting.
Our methods achieve state-of-the-art results compared to other multi-view counting baselines.
arXiv Detail & Related papers (2020-12-02T03:20:30Z) - Multi Receptive Field Network for Semantic Segmentation [8.06045579589765]
We propose a new Multi-Receptive Field Module (MRFM) for semantic segmentation.
We also design an edge-aware loss which is effective in distinguishing the boundaries of object/stuff.
Specifically, we achieve a mean IoU of 83.0 on the Cityscapes dataset and 88.4 mean IoU on the Pascal VOC2012 dataset.
arXiv Detail & Related papers (2020-11-17T11:52:23Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z) - Counting dense objects in remote sensing images [52.182698295053264]
Estimating number of interested objects from a given image is a challenging yet important task.
In this paper, we are interested in counting dense objects from remote sensing images.
To address these issues, we first construct a large-scale object counting dataset based on remote sensing images.
We then benchmark the dataset by designing a novel neural network which can generate density map of an input image.
arXiv Detail & Related papers (2020-02-14T09:13:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.