Enhancing Geometric Factors in Model Learning and Inference for Object
Detection and Instance Segmentation
- URL: http://arxiv.org/abs/2005.03572v4
- Date: Mon, 5 Jul 2021 08:21:41 GMT
- Title: Enhancing Geometric Factors in Model Learning and Inference for Object
Detection and Instance Segmentation
- Authors: Zhaohui Zheng and Ping Wang and Dongwei Ren and Wei Liu and Rongguang
Ye and Qinghua Hu and Wangmeng Zuo
- Abstract summary: We propose Complete-IoU (CIoU) loss and Cluster-NMS for enhancing geometric factors in both bounding box regression and Non-Maximum Suppression (NMS)
The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted $ell_n$-norm loss and IoU-based loss.
Cluster-NMS is very efficient due to its pure GPU implementation, and geometric factors can be incorporated to improve both AP and AR.
- Score: 91.12575065731883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning-based object detection and instance segmentation have achieved
unprecedented progress. In this paper, we propose Complete-IoU (CIoU) loss and
Cluster-NMS for enhancing geometric factors in both bounding box regression and
Non-Maximum Suppression (NMS), leading to notable gains of average precision
(AP) and average recall (AR), without the sacrifice of inference efficiency. In
particular, we consider three geometric factors, i.e., overlap area, normalized
central point distance and aspect ratio, which are crucial for measuring
bounding box regression in object detection and instance segmentation. The
three geometric factors are then incorporated into CIoU loss for better
distinguishing difficult regression cases. The training of deep models using
CIoU loss results in consistent AP and AR improvements in comparison to widely
adopted $\ell_n$-norm loss and IoU-based loss. Furthermore, we propose
Cluster-NMS, where NMS during inference is done by implicitly clustering
detected boxes and usually requires less iterations. Cluster-NMS is very
efficient due to its pure GPU implementation, and geometric factors can be
incorporated to improve both AP and AR. In the experiments, CIoU loss and
Cluster-NMS have been applied to state-of-the-art instance segmentation (e.g.,
YOLACT and BlendMask-RT), and object detection (e.g., YOLO v3, SSD and Faster
R-CNN) models. Taking YOLACT on MS COCO as an example, our method achieves
performance gains as +1.7 AP and +6.2 AR$_{100}$ for object detection, and +0.9
AP and +3.5 AR$_{100}$ for instance segmentation, with 27.1 FPS on one NVIDIA
GTX 1080Ti GPU. All the source code and trained models are available at
https://github.com/Zzh-tju/CIoU
Related papers
- Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement [29.675650285351768]
Machine unlearning (MU) has emerged to enhance the privacy and trustworthiness of deep neural networks.
Approximate MU is a practical method for large-scale models.
We propose a fast-slow parameter update strategy to implicitly approximate the up-to-date salient unlearning direction.
arXiv Detail & Related papers (2024-09-29T15:17:33Z) - Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement [8.592248643229675]
Occupancy prediction plays a pivotal role in autonomous driving (AD)
Existing methods often incur high computational costs, which contradicts the real-time demands of AD.
We propose a Geometric-Semantic Dual-Branch Network (GSDBN) with a hybrid BEV-Voxel representation.
arXiv Detail & Related papers (2024-07-18T04:46:13Z) - Rethinking IoU-based Optimization for Single-stage 3D Object Detection [103.83141677242871]
We propose a Rotation-Decoupled IoU (RDIoU) method that can mitigate the rotation-sensitivity issue.
Our RDIoU simplifies the complex interactions of regression parameters by decoupling the rotation variable as an independent term.
arXiv Detail & Related papers (2022-07-19T15:35:23Z) - Improved Aggregating and Accelerating Training Methods for Spatial Graph
Neural Networks on Fraud Detection [0.0]
This work proposes an improved deep architecture to extend CAmouflage-REsistant GNN (CARE-GNN) to deep models named as Residual Layered CARE-GNN (RLC-GNN)
Three issues of RLC-GNN are the usage of neighboring information reaching limitation, the training difficulty and lack of comprehensive consideration about node features and external patterns.
Experiments are conducted on Yelp and Amazon datasets.
arXiv Detail & Related papers (2022-02-14T09:51:35Z) - The KFIoU Loss for Rotated Object Detection [115.334070064346]
In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss.
Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU.
The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU.
arXiv Detail & Related papers (2022-01-29T10:54:57Z) - Disentangle Your Dense Object Detector [82.22771433419727]
Deep learning-based dense object detectors have achieved great success in the past few years and have been applied to numerous multimedia applications such as video understanding.
However, the current training pipeline for dense detectors is compromised to lots of conjunctions that may not hold.
We propose Disentangled Dense Object Detector (DDOD), in which simple and effective disentanglement mechanisms are designed and integrated into the current state-of-the-art detectors.
arXiv Detail & Related papers (2021-07-07T00:52:16Z) - INSTA-YOLO: Real-Time Instance Segmentation [2.726684740197893]
We propose Insta-YOLO, a novel one-stage end-to-end deep learning model for real-time instance segmentation.
The proposed model is inspired by the YOLO one-shot object detector, with the box regression loss is replaced with regression in the localization head.
We evaluate our model on three datasets, namely, Carnva, Cityscapes and Airbus.
arXiv Detail & Related papers (2021-02-12T21:17:29Z) - Scaling Semantic Segmentation Beyond 1K Classes on a Single GPU [87.48110331544885]
We propose a novel training methodology to train and scale the existing semantic segmentation models.
We demonstrate a clear benefit of our approach on a dataset with 1284 classes, bootstrapped from LVIS and COCO annotations, with three times better mIoU than the DeeplabV3+ model.
arXiv Detail & Related papers (2020-12-14T13:12:38Z) - EOLO: Embedded Object Segmentation only Look Once [0.0]
We introduce an anchor-free and single-shot instance segmentation method, which is conceptually simple with 3 independent branches, fully convolutional and can be used by easily embedding it into mobile and embedded devices.
Our method, refer as EOLO, reformulates the instance segmentation problem as predicting semantic segmentation and distinguishing overlapping objects problem, through instance center classification and 4D distance regression on each pixel.
Without any bells and whistles, EOLO achieves 27.7$%$ in mask mAP under IoU50 and reaches 30 FPS on 1080Ti GPU, with a single-model and single-scale training/testing on
arXiv Detail & Related papers (2020-03-31T21:22:05Z) - Simple and Effective Prevention of Mode Collapse in Deep One-Class
Classification [93.2334223970488]
We propose two regularizers to prevent hypersphere collapse in deep SVDD.
The first regularizer is based on injecting random noise via the standard cross-entropy loss.
The second regularizer penalizes the minibatch variance when it becomes too small.
arXiv Detail & Related papers (2020-01-24T03:44:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.