Bidirectional Multi-scale Attention Networks for Semantic Segmentation
of Oblique UAV Imagery
- URL: http://arxiv.org/abs/2102.03099v1
- Date: Fri, 5 Feb 2021 11:02:15 GMT
- Title: Bidirectional Multi-scale Attention Networks for Semantic Segmentation
of Oblique UAV Imagery
- Authors: Ye Lyu, George Vosselman, Gui-Song Xia, Michael Ying Yang
- Abstract summary: We propose the novel bidirectional multi-scale attention networks, which fuse features from multiple scales bidirectionally for more adaptive and effective feature extraction.
Our model achieved the state-of-the-art (SOTA) result with a mean intersection over union (mIoU) score of 70.80%.
- Score: 30.524771772192757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic segmentation for aerial platforms has been one of the fundamental
scene understanding task for the earth observation. Most of the semantic
segmentation research focused on scenes captured in nadir view, in which
objects have relatively smaller scale variation compared with scenes captured
in oblique view. The huge scale variation of objects in oblique images limits
the performance of deep neural networks (DNN) that process images in a single
scale fashion. In order to tackle the scale variation issue, in this paper, we
propose the novel bidirectional multi-scale attention networks, which fuse
features from multiple scales bidirectionally for more adaptive and effective
feature extraction. The experiments are conducted on the UAVid2020 dataset and
have shown the effectiveness of our method. Our model achieved the
state-of-the-art (SOTA) result with a mean intersection over union (mIoU) score
of 70.80%.
Related papers
- Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - Semantic Labeling of High Resolution Images Using EfficientUNets and
Transformers [5.177947445379688]
We propose a new segmentation model that combines convolutional neural networks with deep transformers.
Our results demonstrate that the proposed methodology improves segmentation accuracy compared to state-of-the-art techniques.
arXiv Detail & Related papers (2022-06-20T12:03:54Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - Multi-Scale Feature Fusion: Learning Better Semantic Segmentation for
Road Pothole Detection [9.356003255288417]
This paper presents a novel pothole detection approach based on single-modal semantic segmentation.
It first extracts visual features from input images using a convolutional neural network.
A channel attention module then reweighs the channel features to enhance the consistency of different feature maps.
arXiv Detail & Related papers (2021-12-24T15:07:47Z) - AdaZoom: Adaptive Zoom Network for Multi-Scale Object Detection in Large
Scenes [57.969186815591186]
Detection in large-scale scenes is a challenging problem due to small objects and extreme scale variation.
We propose a novel Adaptive Zoom (AdaZoom) network as a selective magnifier with flexible shape and focal length to adaptively zoom the focus regions for object detection.
arXiv Detail & Related papers (2021-06-19T03:30:22Z) - CDN-MEDAL: Two-stage Density and Difference Approximation Framework for
Motion Analysis [3.337126420148156]
We propose a novel, two-stage method of change detection with two convolutional neural networks.
Our two-stage framework contains approximately 3.5K parameters in total but still maintains rapid convergence to intricate motion patterns.
arXiv Detail & Related papers (2021-06-07T16:39:42Z) - SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from
Monocular images [94.36401543589523]
We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks.
We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption.
To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2021-01-19T02:41:03Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.