Generalized Focal Loss: Learning Qualified and Distributed Bounding
Boxes for Dense Object Detection
- URL: http://arxiv.org/abs/2006.04388v1
- Date: Mon, 8 Jun 2020 07:24:33 GMT
- Title: Generalized Focal Loss: Learning Qualified and Distributed Bounding
Boxes for Dense Object Detection
- Authors: Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui
Tang and Jian Yang
- Abstract summary: One-stage detector basically formulates object detection as dense classification and localization.
Recent trend for one-stage detectors is to introduce an individual prediction branch to estimate the quality of localization.
This paper delves into the representations of the above three fundamental elements: quality estimation, classification and localization.
- Score: 85.53263670166304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One-stage detector basically formulates object detection as dense
classification and localization. The classification is usually optimized by
Focal Loss and the box location is commonly learned under Dirac delta
distribution. A recent trend for one-stage detectors is to introduce an
individual prediction branch to estimate the quality of localization, where the
predicted quality facilitates the classification to improve detection
performance. This paper delves into the representations of the above three
fundamental elements: quality estimation, classification and localization. Two
problems are discovered in existing practices, including (1) the inconsistent
usage of the quality estimation and classification between training and
inference and (2) the inflexible Dirac delta distribution for localization when
there is ambiguity and uncertainty in complex scenes. To address the problems,
we design new representations for these elements. Specifically, we merge the
quality estimation into the class prediction vector to form a joint
representation of localization quality and classification, and use a vector to
represent arbitrary distribution of box locations. The improved representations
eliminate the inconsistency risk and accurately depict the flexible
distribution in real data, but contain continuous labels, which is beyond the
scope of Focal Loss. We then propose Generalized Focal Loss (GFL) that
generalizes Focal Loss from its discrete form to the continuous version for
successful optimization. On COCO test-dev, GFL achieves 45.0\% AP using
ResNet-101 backbone, surpassing state-of-the-art SAPD (43.5\%) and ATSS
(43.6\%) with higher or comparable inference speed, under the same backbone and
training settings. Notably, our best model can achieve a single-model
single-scale AP of 48.2\%, at 10 FPS on a single 2080Ti GPU. Code and models
are available at https://github.com/implus/GFocal.
Related papers
- Improving Weakly-Supervised Object Localization Using Adversarial Erasing and Pseudo Label [7.400926717561454]
This paper investigates a framework for weakly-supervised object localization.
It aims to train a neural network capable of predicting both the object class and its location using only images and their image-level class labels.
arXiv Detail & Related papers (2024-04-15T06:02:09Z) - Latent Enhancing AutoEncoder for Occluded Image Classification [2.6217304977339473]
We introduce LEARN: Latent Enhancing feAture Reconstruction Network.
An auto-encoder based network that can be incorporated into the classification model before its head.
On the OccludedPASCAL3D+ dataset, the proposed LEARN outperforms standard classification models.
arXiv Detail & Related papers (2024-02-10T12:22:31Z) - Activate and Reject: Towards Safe Domain Generalization under Category
Shift [71.95548187205736]
We study a practical problem of Domain Generalization under Category Shift (DGCS)
It aims to simultaneously detect unknown-class samples and classify known-class samples in the target domains.
Compared to prior DG works, we face two new challenges: 1) how to learn the concept of unknown'' during training with only source known-class samples, and 2) how to adapt the source-trained model to unseen environments.
arXiv Detail & Related papers (2023-10-07T07:53:12Z) - Chaos to Order: A Label Propagation Perspective on Source-Free Domain
Adaptation [8.27771856472078]
We present Chaos to Order (CtO), a novel approach for source-free domain adaptation (SFDA)
CtO strives to constrain semantic credibility and propagate label information among target subpopulations.
Empirical evidence demonstrates that CtO outperforms the state of the arts on three public benchmarks.
arXiv Detail & Related papers (2023-01-20T03:39:35Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - Test-time Batch Statistics Calibration for Covariate Shift [66.7044675981449]
We propose to adapt the deep models to the novel environment during inference.
We present a general formulation $alpha$-BN to calibrate the batch statistics.
We also present a novel loss function to form a unified test time adaptation framework Core.
arXiv Detail & Related papers (2021-10-06T08:45:03Z) - Generalized Focal Loss V2: Learning Reliable Localization Quality
Estimation for Dense Object Detection [78.11775981796367]
GFLV2 (ResNet-101) achieves 46.2 AP at 14.6 FPS, surpassing the previous state-of-the-art ATSS baseline (43.6 AP at 14.6 FPS) by absolute 2.6 AP on COCO tt test-dev.
Code will be available at https://github.com/implus/GFocalV2.
arXiv Detail & Related papers (2020-11-25T17:06:37Z) - Fine-Grained Visual Classification with Efficient End-to-end
Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup.
We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.