A Competitive Method to VIPriors Object Detection Challenge
- URL: http://arxiv.org/abs/2104.09059v1
- Date: Mon, 19 Apr 2021 05:33:39 GMT
- Title: A Competitive Method to VIPriors Object Detection Challenge
- Authors: Fei Shen, Xin He, Mengwan Wei and Yi Xie
- Abstract summary: This report introduces the technical details of our submission to the VIPriors object detection challenge.
We introduce an effective data augmentation method to address the lack of data problem, which contains bbox-jitter, grid-mask, and mix-up.
We also present a robust region of interest (ROI) extraction method to learn more significant ROI features.
- Score: 13.024811732127615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this report, we introduce the technical details of our submission to the
VIPriors object detection challenge. Our solution is based on mmdetction of a
strong baseline open-source detection toolbox. Firstly, we introduce an
effective data augmentation method to address the lack of data problem, which
contains bbox-jitter, grid-mask, and mix-up. Secondly, we present a robust
region of interest (ROI) extraction method to learn more significant ROI
features via embedding global context features. Thirdly, we propose a
multi-model integration strategy to refinement the prediction box, which
weighted boxes fusion (WBF). Experimental results demonstrate that our approach
can significantly improve the average precision (AP) of object detection on the
subset of the COCO2017 dataset.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - MUFASA: Multi-View Fusion and Adaptation Network with Spatial Awareness for Radar Object Detection [3.1212590312985986]
sparsity of radar point clouds poses challenges in achieving precise object detection.
This paper introduces a comprehensive feature extraction method for radar point clouds.
We achieve state-of-the-art results among radar-based methods on the VoD dataset with the mAP of 50.24%.
arXiv Detail & Related papers (2024-08-01T13:52:18Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Attention Based Feature Fusion For Multi-Agent Collaborative Perception [4.120288148198388]
We propose an intermediate collaborative perception solution in the form of a graph attention network (GAT)
The proposed approach develops an attention-based aggregation strategy to fuse intermediate representations exchanged among multiple connected agents.
This approach adaptively highlights important regions in the intermediate feature maps at both the channel and spatial levels, resulting in improved object detection precision.
arXiv Detail & Related papers (2023-05-03T12:06:11Z) - MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection [17.295359521427073]
We propose a Multi-Branch Deep Fusion Network (MBDF-Net) for 3D object detection.
In the first stage, our multi-branch feature extraction network utilizes Adaptive Attention Fusion modules to produce cross-modal fusion features from single-modal semantic features.
In the second stage, we use a region of interest (RoI) -pooled fusion module to generate enhanced local features for refinement.
arXiv Detail & Related papers (2021-08-29T15:40:15Z) - Ensembling object detectors for image and video data analysis [98.26061123111647]
We propose a method for ensembling the outputs of multiple object detectors for improving detection performance and precision of bounding boxes on image data.
We extend it to video data by proposing a two-stage tracking-based scheme for detection refinement.
arXiv Detail & Related papers (2021-02-09T12:38:16Z) - DHARI Report to EPIC-Kitchens 2020 Object Detection Challenge [2.0263791972068628]
In this report, we describe the technical details of oursubmission to the EPIC-Kitchens Object Detection Challenge.
Duck filling and mix-up techniques are introduced to augment the data and significantly improve the robustness of the proposed method.
To bridge the gap of category imbalance, Class Balance Sampling is utilized and greatly improves the test results.
arXiv Detail & Related papers (2020-06-28T09:29:48Z) - Towards High Performance Human Keypoint Detection [87.1034745775229]
We find that context information plays an important role in reasoning human body configuration and invisible keypoints.
Inspired by this, we propose a cascaded context mixer ( CCM) which efficiently integrates spatial and channel context information.
To maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy.
We present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy.
arXiv Detail & Related papers (2020-02-03T02:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.