OGMN: Occlusion-guided Multi-task Network for Object Detection in UAV
Images
- URL: http://arxiv.org/abs/2304.11805v1
- Date: Mon, 24 Apr 2023 03:30:00 GMT
- Title: OGMN: Occlusion-guided Multi-task Network for Object Detection in UAV
Images
- Authors: Xuexue Li, Wenhui Diao, Yongqiang Mao, Peng Gao, Xiuhua Mao, Xinming
Li and Xian Sun
- Abstract summary: Occlusion between objects is one of the overlooked challenges for object detection in UAV images.
We introduce the occlusion-guided multi-task network (OGMN) to address this challenge.
Our OGMN achieves 35.0% mAP on the Visdrone dataset and outperforms the baseline by 5.3%.
- Score: 13.90359920041577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Occlusion between objects is one of the overlooked challenges for object
detection in UAV images. Due to the variable altitude and angle of UAVs,
occlusion in UAV images happens more frequently than that in natural scenes.
Compared to occlusion in natural scene images, occlusion in UAV images happens
with feature confusion problem and local aggregation characteristic. And we
found that extracting or localizing occlusion between objects is beneficial for
the detector to address this challenge. According to this finding, the
occlusion localization task is introduced, which together with the object
detection task constitutes our occlusion-guided multi-task network (OGMN). The
OGMN contains the localization of occlusion and two occlusion-guided multi-task
interactions. In detail, an occlusion estimation module (OEM) is proposed to
precisely localize occlusion. Then the OGMN utilizes the occlusion localization
results to implement occlusion-guided detection with two multi-task
interactions. One interaction for the guide is between two task decoders to
address the feature confusion problem, and an occlusion decoupling head (ODH)
is proposed to replace the general detection head. Another interaction for
guide is designed in the detection process according to local aggregation
characteristic, and a two-phase progressive refinement process (TPP) is
proposed to optimize the detection process. Extensive experiments demonstrate
the effectiveness of our OGMN on the Visdrone and UAVDT datasets. In
particular, our OGMN achieves 35.0% mAP on the Visdrone dataset and outperforms
the baseline by 5.3%. And our OGMN provides a new insight for accurate
occlusion localization and achieves competitive detection performance.
Related papers
- Seamless Detection: Unifying Salient Object Detection and Camouflaged Object Detection [73.85890512959861]
We propose a task-agnostic framework to unify Salient Object Detection (SOD) and Camouflaged Object Detection (COD)
We design a simple yet effective contextual decoder involving the interval-layer and global context, which achieves an inference speed of 67 fps.
Experiments on public SOD and COD datasets demonstrate the superiority of our proposed framework in both supervised and unsupervised settings.
arXiv Detail & Related papers (2024-12-22T03:25:43Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - Multiple Object Tracking based on Occlusion-Aware Embedding Consistency
Learning [46.726678333518066]
Occlusion Prediction Module (OPM) and Occlusion-Aware Association Module (OAAM)
OPM predicts occlusion information for each true detection, facilitating the selection of valid samples for consistency learning of the track's visual embedding.
OAAM generates two separate embeddings for each track, guaranteeing consistency in both unoccluded and occluded detections.
arXiv Detail & Related papers (2023-11-05T06:08:58Z) - Improving Vision Anomaly Detection with the Guidance of Language
Modality [64.53005837237754]
This paper tackles the challenges for vision modality from a multimodal point of view.
We propose Cross-modal Guidance (CMG) to tackle the redundant information issue and sparse space issue.
To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality.
arXiv Detail & Related papers (2023-10-04T13:44:56Z) - OccluTrack: Rethinking Awareness of Occlusion for Enhancing Multiple
Pedestrian Tracking [7.964206483424679]
Existing methods suffer from inaccurate motion estimation, appearance feature extraction, and association due to occlusion.
We suggest that the key insight is explicit motion estimation, reliable appearance features, and fair association in occlusion scenes.
Specifically, we propose an adaptive occlusion-aware multiple pedestrian tracker, OccluTrack.
arXiv Detail & Related papers (2023-09-19T06:43:18Z) - Occlusion-Aware Detection and Re-ID Calibrated Network for Multi-Object
Tracking [38.36872739816151]
Occlusion-Aware Attention (OAA) module in the detector highlights the object features while suppressing the occluded background regions.
OAA can serve as a modulator that enhances the detector for some potentially occluded objects.
We design a Re-ID embedding matching block based on the optimal transport problem.
arXiv Detail & Related papers (2023-08-30T06:56:53Z) - Object Semantics Give Us the Depth We Need: Multi-task Approach to
Aerial Depth Completion [1.2239546747355885]
We propose a novel approach to jointly execute the two tasks in a single pass.
The proposed method is based on an encoder-focused multi-task learning model that exposes the two tasks to jointly learned features.
Experimental results show that the proposed multi-task network outperforms its single-task counterpart.
arXiv Detail & Related papers (2023-04-25T03:21:32Z) - Threatening Patch Attacks on Object Detection in Optical Remote Sensing
Images [55.09446477517365]
Advanced Patch Attacks (PAs) on object detection in natural images have pointed out the great safety vulnerability in methods based on deep neural networks.
We propose a more Threatening PA without the scarification of the visual quality, dubbed TPA.
To the best of our knowledge, this is the first attempt to study the PAs on object detection in O-RSIs, and we hope this work can get our readers interested in studying this topic.
arXiv Detail & Related papers (2023-02-13T02:35:49Z) - FGAHOI: Fine-Grained Anchors for Human-Object Interaction Detection [4.534713782093219]
A novel end-to-end transformer-based framework (FGAHOI) is proposed to alleviate the above problems.
FGAHOI comprises three dedicated components namely, multi-scale sampling (MSS), hierarchical spatial-aware merging (HSAM) and task-aware merging mechanism (TAM)
arXiv Detail & Related papers (2023-01-08T03:53:50Z) - Detecting Adversarial Perturbations in Multi-Task Perception [32.9951531295576]
We propose a novel adversarial perturbation detection scheme based on multi-task perception of complex vision tasks.
adversarial perturbations are detected by inconsistencies between extracted edges of the input image, the depth output, and the segmentation output.
We show that under an assumption of a 5% false positive rate up to 100% of images are correctly detected as adversarially perturbed, depending on the strength of the perturbation.
arXiv Detail & Related papers (2022-03-02T15:25:17Z) - SEA: Bridging the Gap Between One- and Two-stage Detector Distillation
via SEmantic-aware Alignment [76.80165589520385]
We name our method SEA (SEmantic-aware Alignment) distillation given the nature of abstracting dense fine-grained information.
It achieves new state-of-the-art results on the challenging object detection task on both one- and two-stage detectors.
arXiv Detail & Related papers (2022-03-02T04:24:05Z) - MRDet: A Multi-Head Network for Accurate Oriented Object Detection in
Aerial Images [51.227489316673484]
We propose an arbitrary-oriented region proposal network (AO-RPN) to generate oriented proposals transformed from horizontal anchors.
To obtain accurate bounding boxes, we decouple the detection task into multiple subtasks and propose a multi-head network.
Each head is specially designed to learn the features optimal for the corresponding task, which allows our network to detect objects accurately.
arXiv Detail & Related papers (2020-12-24T06:36:48Z) - Unsupervised Instance Segmentation in Microscopy Images via Panoptic
Domain Adaptation and Task Re-weighting [86.33696045574692]
We propose a Cycle Consistency Panoptic Domain Adaptive Mask R-CNN (CyC-PDAM) architecture for unsupervised nuclei segmentation in histopathology images.
We first propose a nuclei inpainting mechanism to remove the auxiliary generated objects in the synthesized images.
Secondly, a semantic branch with a domain discriminator is designed to achieve panoptic-level domain adaptation.
arXiv Detail & Related papers (2020-05-05T11:08:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.