Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
- URL: http://arxiv.org/abs/2011.12450v2
- Date: Mon, 26 Apr 2021 14:20:03 GMT
- Title: Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
- Authors: Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan,
Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, Ping Luo
- Abstract summary: We present Sparse R-CNN, a purely sparse method for object detection in images.
Final predictions are directly output without non-maximum suppression post-procedure.
We hope our work could inspire re-thinking the convention of dense prior in object detectors.
- Score: 77.9701193170127
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Sparse R-CNN, a purely sparse method for object detection in
images. Existing works on object detection heavily rely on dense object
candidates, such as $k$ anchor boxes pre-defined on all grids of image feature
map of size $H\times W$. In our method, however, a fixed sparse set of learned
object proposals, total length of $N$, are provided to object recognition head
to perform classification and location. By eliminating $HWk$ (up to hundreds of
thousands) hand-designed object candidates to $N$ (e.g. 100) learnable
proposals, Sparse R-CNN completely avoids all efforts related to object
candidates design and many-to-one label assignment. More importantly, final
predictions are directly output without non-maximum suppression post-procedure.
Sparse R-CNN demonstrates accuracy, run-time and training convergence
performance on par with the well-established detector baselines on the
challenging COCO dataset, e.g., achieving 45.0 AP in standard $3\times$
training schedule and running at 22 fps using ResNet-50 FPN model. We hope our
work could inspire re-thinking the convention of dense prior in object
detectors. The code is available at: https://github.com/PeizeSun/SparseR-CNN.
Related papers
- Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection [83.72430401516674]
GAKer is able to construct adversarial examples to any target class.
Our method achieves an approximately $14.13%$ higher attack success rate for unknown classes.
arXiv Detail & Related papers (2024-07-17T03:24:09Z) - PG-RCNN: Semantic Surface Point Generation for 3D Object Detection [19.341260543105548]
Point Generation R-CNN (PG-RCNN) is a novel end-to-end detector for 3D object detection.
Uses a jointly trained RoI point generation module to process contextual information of RoIs.
For every generated point, PG-RCNN assigns a semantic feature that indicates the estimated foreground probability.
arXiv Detail & Related papers (2023-07-24T09:22:09Z) - Oriented R-CNN for Object Detection [61.78746189807462]
This work proposes an effective and simple oriented object detection framework, termed Oriented R-CNN.
In the first stage, we propose an oriented Region Proposal Network (oriented RPN) that directly generates high-quality oriented proposals in a nearly cost-free manner.
The second stage is oriented R-CNN head for refining oriented Regions of Interest (oriented RoIs) and recognizing them.
arXiv Detail & Related papers (2021-08-12T12:47:43Z) - Probabilistic Robustness Analysis for DNNs based on PAC Learning [14.558877524991752]
We view a DNN as a function $boldsymbolf$ from inputs to outputs, and consider the local robustness property for a given input.
We learn the score difference function $f_i-f_ell$ with respect to the target label $ell$ and attacking label $i$.
Our framework can handle very large neural networks like ResNet152 with $6.5$M neurons, and often generates adversarial examples.
arXiv Detail & Related papers (2021-01-25T14:10:52Z) - OneNet: Towards End-to-End One-Stage Object Detection [39.445348555252785]
Existing one-stage object detectors assign labels by only location cost.
Without classification cost, sole location cost leads to redundant boxes of high confidence scores in inference.
To design an end-to-end one-stage object detector, we propose Minimum Cost Assignment.
OneNet achieves 35.0 AP/80 FPS and 37.7 AP/50 FPS with image size of 512 pixels.
arXiv Detail & Related papers (2020-12-10T16:15:19Z) - Corner Proposal Network for Anchor-free, Two-stage Object Detection [174.59360147041673]
The goal of object detection is to determine the class and location of objects in an image.
This paper proposes a novel anchor-free, two-stage framework which first extracts a number of object proposals.
We demonstrate that these two stages are effective solutions for improving recall and precision.
arXiv Detail & Related papers (2020-07-27T19:04:57Z) - FCOS: A simple and strong anchor-free object detector [111.87691210818194]
We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion.
Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes.
In contrast, our proposed detector FCOS is anchor box free, as well as proposal free.
arXiv Detail & Related papers (2020-06-14T01:03:39Z) - Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance
Disparity Estimation [51.17232267143098]
We propose a novel system named Disp R-CNN for 3D object detection from stereo images.
We use a statistical shape model to generate dense disparity pseudo-ground-truth without the need of LiDAR point clouds.
Experiments on the KITTI dataset show that, even when LiDAR ground-truth is not available at training time, Disp R-CNN achieves competitive performance and outperforms previous state-of-the-art methods by 20% in terms of average precision.
arXiv Detail & Related papers (2020-04-07T17:48:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.