Fugu-MT 論文翻訳(概要): Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

論文の概要: Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

arxiv url: http://arxiv.org/abs/2006.04388v1
Date: Mon, 8 Jun 2020 07:24:33 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-24 02:02:40.236830
Title: Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection
Title（参考訳）: 一般化焦点損失:密集物体検出のための資格と分散境界ボックスの学習
Authors: Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang and Jian Yang
Abstract要約: 一段検出器は基本的に、物体検出を密度の高い分類と位置化として定式化する。 1段検出器の最近の傾向は、局所化の質を推定するために個別の予測分岐を導入することである。本稿では, 上記の3つの基本要素, 品質推定, 分類, ローカライゼーションについて述べる。
参考スコア（独自算出の注目度）: 85.53263670166304
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One-stage detector basically formulates object detection as dense classification and localization. The classification is usually optimized by Focal Loss and the box location is commonly learned under Dirac delta distribution. A recent trend for one-stage detectors is to introduce an individual prediction branch to estimate the quality of localization, where the predicted quality facilitates the classification to improve detection performance. This paper delves into the representations of the above three fundamental elements: quality estimation, classification and localization. Two problems are discovered in existing practices, including (1) the inconsistent usage of the quality estimation and classification between training and inference and (2) the inflexible Dirac delta distribution for localization when there is ambiguity and uncertainty in complex scenes. To address the problems, we design new representations for these elements. Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations. The improved representations eliminate the inconsistency risk and accurately depict the flexible distribution in real data, but contain continuous labels, which is beyond the scope of Focal Loss. We then propose Generalized Focal Loss (GFL) that generalizes Focal Loss from its discrete form to the continuous version for successful optimization. On COCO test-dev, GFL achieves 45.0\% AP using ResNet-101 backbone, surpassing state-of-the-art SAPD (43.5\%) and ATSS (43.6\%) with higher or comparable inference speed, under the same backbone and training settings. Notably, our best model can achieve a single-model single-scale AP of 48.2\%, at 10 FPS on a single 2080Ti GPU. Code and models are available at https://github.com/implus/GFocal.
Abstract（参考訳）: 一段検出器は基本的に、物体検出を密度の高い分類と位置化として定式化する。分類は通常フォカルロスによって最適化され、ボックスの位置はディラックデルタ分布の下で一般的に学習される。最近の1段階検出器のトレンドは、予測品質が検出性能を向上させるために分類が容易となる局所化の品質を推定する個別予測ブランチを導入することである。本稿では, 上記の3つの基本要素, 品質推定, 分類, 局所化について述べる。既存の手法では,(1) 訓練と推論の質推定と分類の不整合,(2) 複雑な場面における曖昧さと不確実性がある場合の局所化の非フレキシブルディラックデルタ分布,の2つの問題点が指摘されている。この問題に対処するために、これらの要素の新しい表現を設計する。具体的には, 品質推定をクラス予測ベクトルに融合し, 局所的品質と分類の結合表現を形成し, ボックス位置の任意の分布を表すベクトルを用いる。改良された表現は、矛盾するリスクを排除し、実際のデータの柔軟な分布を正確に描写するが、焦点損失の範囲を超えた連続ラベルを含む。次に、最適化を成功させるために、その離散形式から連続バージョンへの焦点損失を一般化する一般化焦点損失(gfl)を提案する。 COCOテストデブでは、GFLはResNet-101バックボーンを使用して45.0\% APを達成し、最先端のSAPD(43.5\%)とATSS(43.6\%)を上回り、バックボーンとトレーニング設定が同じである。特に、最高のモデルはシングルモデルのシングルスケールapを1つの2080ti gpu上で10fpsで48.2\%達成できます。コードとモデルはhttps://github.com/implus/gfocalで入手できる。

論文の概要: Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

関連論文リスト