HRDNet: High-resolution Detection Network for Small Objects
- URL: http://arxiv.org/abs/2006.07607v1
- Date: Sat, 13 Jun 2020 10:25:35 GMT
- Title: HRDNet: High-resolution Detection Network for Small Objects
- Authors: Ziming Liu and Guangyu Gao and Lin Sun and Zhiyuan Fang
- Abstract summary: Small object detection is challenging because small objects do not contain detailed information and may even disappear in the deep network.
We propose the High-Resolution Detection Network (HRDNet) to keep the benefits of high-resolution images without bringing up new problems.
We propose Multi-Depth Image Pyramid Network (MD-IPN) and Multi-Scale Feature Pyramid Network (MS-FPN) in HRDNet.
- Score: 10.802856121451404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Small object detection is challenging because small objects do not contain
detailed information and may even disappear in the deep network. Usually,
feeding high-resolution images into a network can alleviate this issue.
However, simply enlarging the resolution will cause more problems, such as
that, it aggravates the large variant of object scale and introduces unbearable
computation cost. To keep the benefits of high-resolution images without
bringing up new problems, we proposed the High-Resolution Detection Network
(HRDNet). HRDNet takes multiple resolution inputs using multi-depth backbones.
To fully take advantage of multiple features, we proposed Multi-Depth Image
Pyramid Network (MD-IPN) and Multi-Scale Feature Pyramid Network (MS-FPN) in
HRDNet. MD-IPN maintains multiple position information using multiple depth
backbones. Specifically, high-resolution input will be fed into a shallow
network to reserve more positional information and reducing the computational
cost while low-resolution input will be fed into a deep network to extract more
semantics. By extracting various features from high to low resolutions, the
MD-IPN is able to improve the performance of small object detection as well as
maintaining the performance of middle and large objects. MS-FPN is proposed to
align and fuse multi-scale feature groups generated by MD-IPN to reduce the
information imbalance between these multi-scale multi-level features. Extensive
experiments and ablation studies are conducted on the standard benchmark
dataset MS COCO2017, Pascal VOC2007/2012 and a typical small object dataset,
VisDrone 2019. Notably, our proposed HRDNet achieves the state-of-the-art on
these datasets and it performs better on small objects.
Related papers
- PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network [24.54269823691119]
We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives.
To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD.
All the images are finely annotated in pixel-level, far exceeding previous low-resolution SOD datasets.
arXiv Detail & Related papers (2024-08-02T09:31:21Z) - Mutual-Guided Dynamic Network for Image Fusion [51.615598671899335]
We propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs.
Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks.
arXiv Detail & Related papers (2023-08-24T03:50:37Z) - Recurrent Multi-scale Transformer for High-Resolution Salient Object
Detection [68.65338791283298]
Salient Object Detection (SOD) aims to identify and segment the most conspicuous objects in an image or video.
Traditional SOD methods are largely limited to low-resolution images, making them difficult to adapt to the development of High-Resolution SOD.
In this work, we first propose a new HRS10K dataset, which contains 10,500 high-quality annotated images at 2K-8K resolution.
arXiv Detail & Related papers (2023-08-07T17:49:04Z) - Hi-ResNet: Edge Detail Enhancement for High-Resolution Remote Sensing Segmentation [10.919956120261539]
High-resolution remote sensing (HRS) semantic segmentation extracts key objects from high-resolution coverage areas.
objects of the same category within HRS images show significant differences in scale and shape across diverse geographical environments.
We propose a High-resolution remote sensing network (Hi-ResNet) with efficient network structure designs.
arXiv Detail & Related papers (2023-05-22T03:58:25Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Pyramid Grafting Network for One-Stage High Resolution Saliency
Detection [29.013012579688347]
We propose a one-stage framework called Pyramid Grafting Network (PGNet) to extract features from different resolution images independently.
An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically.
We contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD, containing 5,920 images at 4K-8K resolutions.
arXiv Detail & Related papers (2022-04-11T12:22:21Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - SaNet: Scale-aware neural Network for Parsing Multiple Spatial
Resolution Aerial Images [0.0]
We propose a novel scale-aware neural network (SaNet) for parsing multiple spatial resolution aerial images.
For coping with the imbalanced segmentation quality between larger and smaller objects caused by the scale variation, the SaNet deploys a densely connected feature network (DCFPN) module.
To alleviate the informative feature loss, a SFR module is incorporated into the network to learn scale-invariant features with spatial relation enhancement.
arXiv Detail & Related papers (2021-03-14T14:19:46Z) - Dense Multiscale Feature Fusion Pyramid Networks for Object Detection in
UAV-Captured Images [0.09065034043031667]
We propose a novel method called Dense Multiscale Feature Fusion Pyramid Networks(DMFFPN), which is aimed at obtaining rich features as much as possible.
Specifically, the dense connection is designed to fully utilize the representation from the different convolutional layers.
Experiments on the drone-based datasets named VisDrone-DET suggest a competitive performance of our method.
arXiv Detail & Related papers (2020-12-19T10:05:31Z) - Resolution Adaptive Networks for Efficient Inference [53.04907454606711]
We propose a novel Resolution Adaptive Network (RANet), which is inspired by the intuition that low-resolution representations are sufficient for classifying "easy" inputs.
In RANet, the input images are first routed to a lightweight sub-network that efficiently extracts low-resolution representations.
High-resolution paths in the network maintain the capability to recognize the "hard" samples.
arXiv Detail & Related papers (2020-03-16T16:54:36Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.