Evaluating Large-Vocabulary Object Detectors: The Devil is in the
Details
- URL: http://arxiv.org/abs/2102.01066v1
- Date: Mon, 1 Feb 2021 18:56:02 GMT
- Title: Evaluating Large-Vocabulary Object Detectors: The Devil is in the
Details
- Authors: Achal Dave, Piotr Doll\'ar, Deva Ramanan, Alexander Kirillov, Ross
Girshick
- Abstract summary: We find that the default implementation of AP is neither category independent, nor does it directly reward properly calibrated detectors.
We show that the default implementation produces a gameable metric, where a simple, nonsensical re-ranking policy can improve AP by a large margin.
We benchmark recent advances in large-vocabulary detection and find that many reported gains do not translate to improvements under our new per-class independent evaluation.
- Score: 107.2722027807328
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By design, average precision (AP) for object detection aims to treat all
classes independently: AP is computed independently per category and averaged.
On the one hand, this is desirable as it treats all classes, rare to frequent,
equally. On the other hand, it ignores cross-category confidence calibration, a
key property in real-world use cases. Unfortunately, we find that on
imbalanced, large-vocabulary datasets, the default implementation of AP is
neither category independent, nor does it directly reward properly calibrated
detectors. In fact, we show that the default implementation produces a gameable
metric, where a simple, nonsensical re-ranking policy can improve AP by a large
margin. To address these limitations, we introduce two complementary metrics.
First, we present a simple fix to the default AP implementation, ensuring that
it is truly independent across categories as originally intended. We benchmark
recent advances in large-vocabulary detection and find that many reported gains
do not translate to improvements under our new per-class independent
evaluation, suggesting recent improvements may arise from difficult to
interpret changes to cross-category rankings. Given the importance of reliably
benchmarking cross-category rankings, we consider a pooled version of AP
(AP-pool) that rewards properly calibrated detectors by directly comparing
cross-category rankings. Finally, we revisit classical approaches for
calibration and find that explicitly calibrating detectors improves
state-of-the-art on AP-pool by 1.7 points.
Related papers
- Rank-DETR for High Quality Object Detection [52.82810762221516]
A highly performant object detector requires accurate ranking for the bounding box predictions.
In this work, we introduce a simple and highly performant DETR-based object detector by proposing a series of rank-oriented designs.
arXiv Detail & Related papers (2023-10-13T04:48:32Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair
Selection [19.940491797959407]
In this work, we revisit the average precision (AP)loss and reveal that the crucial element is that of selecting the ranking pairs between positive and negative samples.
We propose two strategies to improve the AP loss. The first is a novel Adaptive Pairwise Error (APE) loss that focusing on ranking pairs in both positive and negative samples.
Experiments conducted on the MSCOCO dataset support our analysis and demonstrate the superiority of our proposed method compared with current classification and ranking loss.
arXiv Detail & Related papers (2022-07-25T10:33:06Z) - Hierarchical Average Precision Training for Pertinent Image Retrieval [0.0]
This paper introduces a new hierarchical AP training method for pertinent image retrieval (HAP-PIER)
HAP-PIER is based on a new H-AP metric, which integrates errors' importance and better evaluate rankings.
Experiments on 6 datasets show that HAPPIER significantly outperforms state-of-the-art methods for hierarchical retrieval.
arXiv Detail & Related papers (2022-07-05T07:55:18Z) - Beyond mAP: Towards better evaluation of instance segmentation [23.562251593257674]
Average Precision does not penalize duplicate predictions in the high-recall range.
We propose two new measures to explicitly measure the amount of both spatial and categorical duplicate predictions.
Our Semantic Sorting and NMS can be added as a plug-and-play module to mitigate hedged predictions and preserve AP.
arXiv Detail & Related papers (2022-07-04T17:56:14Z) - Decision Making for Hierarchical Multi-label Classification with
Multidimensional Local Precision Rate [4.812468844362369]
We introduce a new statistic called the multidimensional local precision rate (mLPR) for each object in each class.
We show that classification decisions made by simply sorting objects across classes in descending order of their mLPRs can, in theory, ensure the class hierarchy.
In response, we introduce HierRank, a new algorithm that maximizes an empirical version of CATCH using estimated mLPRs while respecting the hierarchy.
arXiv Detail & Related papers (2022-05-16T17:43:35Z) - Open-Set Recognition: A Good Closed-Set Classifier is All You Need [146.6814176602689]
We show that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes.
We use this correlation to boost the performance of the cross-entropy OSR 'baseline' by improving its closed-set accuracy.
We also construct new benchmarks which better respect the task of detecting semantic novelty.
arXiv Detail & Related papers (2021-10-12T17:58:59Z) - AP-Loss for Accurate One-Stage Object Detection [49.13608882885456]
One-stage object detectors are trained by optimizing classification-loss and localization-loss simultaneously.
The former suffers much from extreme foreground-background imbalance due to the large number of anchors.
This paper proposes a novel framework to replace the classification task in one-stage detectors with a ranking task.
arXiv Detail & Related papers (2020-08-17T13:22:01Z) - Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval [94.73459295405507]
Smooth-AP is a plug-and-play objective function that allows for end-to-end training of deep networks.
We apply Smooth-AP to standard retrieval benchmarks: Stanford Online products and VehicleID.
We also evaluate on larger-scale datasets: INaturalist for fine-grained category retrieval, VGGFace2 and IJB-C for face retrieval.
arXiv Detail & Related papers (2020-07-23T17:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.