AdaZoom: Adaptive Zoom Network for Multi-Scale Object Detection in Large
Scenes
- URL: http://arxiv.org/abs/2106.10409v1
- Date: Sat, 19 Jun 2021 03:30:22 GMT
- Title: AdaZoom: Adaptive Zoom Network for Multi-Scale Object Detection in Large
Scenes
- Authors: Jingtao Xu and Yali Li and Shengjin Wang
- Abstract summary: Detection in large-scale scenes is a challenging problem due to small objects and extreme scale variation.
We propose a novel Adaptive Zoom (AdaZoom) network as a selective magnifier with flexible shape and focal length to adaptively zoom the focus regions for object detection.
- Score: 57.969186815591186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detection in large-scale scenes is a challenging problem due to small objects
and extreme scale variation. It is essential to focus on the image regions of
small objects. In this paper, we propose a novel Adaptive Zoom (AdaZoom)
network as a selective magnifier with flexible shape and focal length to
adaptively zoom the focus regions for object detection. Based on policy
gradient, we construct a reinforcement learning framework for focus region
generation, with the reward formulated by object distributions. The scales and
aspect ratios of the generated regions are adaptive to the scales and
distribution of objects inside. We apply variable magnification according to
the scale of the region for adaptive multi-scale detection. We further propose
collaborative training to complementarily promote the performance of AdaZoom
and the detection network. To validate the effectiveness, we conduct extensive
experiments on VisDrone2019, UAVDT, and DOTA datasets. The experiments show
AdaZoom brings a consistent and significant improvement over different
detection networks, achieving state-of-the-art performance on these datasets,
especially outperforming the existing methods by AP of 4.64% on Vis-Drone2019.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Two-Stage Adaptive Network for Semi-Supervised Cross-Domain Crater Detection under Varying Scenario Distributions [17.28368878719324]
We propose a two-stage adaptive network (TAN) for cross-domain crater detection.
Our network is built on the YOLOv5 detector, where a series of strategies are employed to enhance its cross-domain generalisation ability.
Experimental results on benchmark datasets demonstrate that the proposed network can enhance domain adaptation ability for crater detection under varying scenario distributions.
arXiv Detail & Related papers (2023-12-11T07:16:49Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - Local Magnification for Data and Feature Augmentation [53.04028225837681]
We propose an easy-to-implement and model-free data augmentation method called Local Magnification (LOMA)
LOMA generates additional training data by randomly magnifying a local area of the image.
Experiments show that our proposed LOMA, though straightforward, can be combined with standard data augmentation to significantly improve the performance on image classification and object detection.
arXiv Detail & Related papers (2022-11-15T02:51:59Z) - Progressive Domain Adaptation with Contrastive Learning for Object
Detection in the Satellite Imagery [0.0]
State-of-the-art object detection methods largely fail to identify small and dense objects.
We propose a small object detection pipeline that improves the feature extraction process.
We show we can alleviate the degradation of object identification in previously unseen datasets.
arXiv Detail & Related papers (2022-09-06T15:16:35Z) - Bidirectional Multi-scale Attention Networks for Semantic Segmentation
of Oblique UAV Imagery [30.524771772192757]
We propose the novel bidirectional multi-scale attention networks, which fuse features from multiple scales bidirectionally for more adaptive and effective feature extraction.
Our model achieved the state-of-the-art (SOTA) result with a mean intersection over union (mIoU) score of 70.80%.
arXiv Detail & Related papers (2021-02-05T11:02:15Z) - Dense Multiscale Feature Fusion Pyramid Networks for Object Detection in
UAV-Captured Images [0.09065034043031667]
We propose a novel method called Dense Multiscale Feature Fusion Pyramid Networks(DMFFPN), which is aimed at obtaining rich features as much as possible.
Specifically, the dense connection is designed to fully utilize the representation from the different convolutional layers.
Experiments on the drone-based datasets named VisDrone-DET suggest a competitive performance of our method.
arXiv Detail & Related papers (2020-12-19T10:05:31Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - Crowd Scene Analysis by Output Encoding [38.69524011345539]
We propose a Compressed Output Sensing (CSOE) scheme, which casts detecting coordinates of small objects into a task of signal regression in encoding signal space.
CSOE helps to boost localization performance in circumstances where targets are highly crowded without huge scale variation.
We also develop an Adaptive Receptive Field Weighting (ARFW) module, which deals with scale variation issue.
arXiv Detail & Related papers (2020-01-27T01:34:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.