Related papers: Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

URL: http://arxiv.org/abs/2011.12885v1
Date: Wed, 25 Nov 2020 17:06:37 GMT
Title: Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection
Authors: Xiang Li, Wenhai Wang, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang
Abstract summary: GFLV2 (ResNet-101) achieves 46.2 AP at 14.6 FPS, surpassing the previous state-of-the-art ATSS baseline (43.6 AP at 14.6 FPS) by absolute 2.6 AP on COCO tt test-dev. Code will be available at https://github.com/implus/GFocalV2.
Score: 78.11775981796367
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Localization Quality Estimation (LQE) is crucial and popular in the recent advancement of dense object detectors since it can provide accurate ranking scores that benefit the Non-Maximum Suppression processing and improve detection performance. As a common practice, most existing methods predict LQE scores through vanilla convolutional features shared with object classification or bounding box regression. In this paper, we explore a completely novel and different perspective to perform LQE -- based on the learned distributions of the four parameters of the bounding box. The bounding box distributions are inspired and introduced as "General Distribution" in GFLV1, which describes the uncertainty of the predicted bounding boxes well. Such a property makes the distribution statistics of a bounding box highly correlated to its real localization quality. Specifically, a bounding box distribution with a sharp peak usually corresponds to high localization quality, and vice versa. By leveraging the close correlation between distribution statistics and the real localization quality, we develop a considerably lightweight Distribution-Guided Quality Predictor (DGQP) for reliable LQE based on GFLV1, thus producing GFLV2. To our best knowledge, it is the first attempt in object detection to use a highly relevant, statistical representation to facilitate LQE. Extensive experiments demonstrate the effectiveness of our method. Notably, GFLV2 (ResNet-101) achieves 46.2 AP at 14.6 FPS, surpassing the previous state-of-the-art ATSS baseline (43.6 AP at 14.6 FPS) by absolute 2.6 AP on COCO {\tt test-dev}, without sacrificing the efficiency both in training and inference. Code will be available at https://github.com/implus/GFocalV2.

Related papers

Scaling Test-Time Compute Without Verification or RL is Suboptimal [70.28430200655919]
We show that finetuning LLMs with verifier-based (VB) methods based on RL or search is far superior to verifier-free (VF) approaches based on distilling or cloning search traces, given a fixed amount of compute/data budget. We corroborate our theory empirically on both didactic and math reasoning problems with 3/8B-sized pre-trained LLMs, where we find verification is crucial for scaling test-time compute.
arXiv Detail & Related papers (2025-02-17T18:43:24Z)
Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning [44.34372470957298]
We propose a method for boosting the generalization ability of pre-trained vision-language models (VLMs) The idea is realized by exploiting out-of-distribution (OOD) detection to predict whether a sample belongs to a base distribution or a novel distribution. With the help of OOD detectors, the harmonic mean of CoOp and ProGrad increase by 2.6 and 1.5 percentage points over 11 recognition datasets in the base-to-novel setting.
arXiv Detail & Related papers (2024-03-31T08:28:42Z)
Being Aware of Localization Accuracy By Generating Predicted-IoU-Guided Quality Scores [24.086202809990795]
We develop an elegant LQE branch to acquire localization quality score guided by predicted IoU. A novel one stage detector termed CLQ is proposed. Experiments show that CLQ achieves state-of-the-arts' performance at an accuracy of 47.8 AP and a speed of 11.5 fps.
arXiv Detail & Related papers (2023-09-23T05:27:59Z)
Distribution-Aware Calibration for Object Detection with Noisy Bounding Boxes [58.2797274877934]
We propose DIStribution-aware CalibratiOn (DISCO) to model the spatial distribution of proposals for calibrating supervision signals. Three distribution-aware techniques are developed to improve classification, localization, and interpretability.
arXiv Detail & Related papers (2023-08-23T09:20:05Z)
Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations. DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals. We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z)
Source-Free Progressive Graph Learning for Open-Set Domain Adaptation [44.63301903324783]
Open-set domain adaptation (OSDA) has gained considerable attention in many visual recognition tasks. We propose a Progressive Graph Learning (PGL) framework that decomposes the target hypothesis space into the shared and unknown subspaces. We also tackle a more realistic source-free open-set domain adaptation (SF-OSDA) setting that makes no assumption about the coexistence of source and target domains.
arXiv Detail & Related papers (2022-02-13T01:19:41Z)
Achieving Statistical Optimality of Federated Learning: Beyond Stationary Points [19.891597817559038]
Federated Learning (FL) is a promising framework that has great potentials in privacy preservation and in lowering the computation load at the cloud. Recent work raised concerns on two methods: (1) their fixed points do not correspond to the stationary points of the original optimization problem, and (2) the common model found might not generalize well locally. We show, in the general kernel regression setting, that both FedAvg and FedProx converge to the minimax-optimal error rates.
arXiv Detail & Related papers (2021-06-29T09:59:43Z)
Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation [85.22775182688798]
This work proposes a novel, flexible, and accurate refinement module called Alpha-Refine. It can significantly improve the base trackers' box estimation quality. Experiments on TrackingNet, LaSOT, GOT-10K, and VOT 2020 benchmarks show that our approach significantly improves the base trackers' performance with little extra latency.
arXiv Detail & Related papers (2020-12-12T13:33:25Z)
Learning Calibrated Uncertainties for Domain Shift: A Distributionally Robust Learning Approach [150.8920602230832]
We propose a framework for learning calibrated uncertainties under domain shifts. In particular, the density ratio estimation reflects the closeness of a target (test) sample to the source (training) distribution. We show that our proposed method generates calibrated uncertainties that benefit downstream tasks.
arXiv Detail & Related papers (2020-10-08T02:10:54Z)
Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection [85.53263670166304]
One-stage detector basically formulates object detection as dense classification and localization. Recent trend for one-stage detectors is to introduce an individual prediction branch to estimate the quality of localization. This paper delves into the representations of the above three fundamental elements: quality estimation, classification and localization.
arXiv Detail & Related papers (2020-06-08T07:24:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.