Fugu-MT 論文翻訳(概要): Pedestrian Detection: Domain Generalization, CNNs, Transformers and Beyond

論文の概要: Pedestrian Detection: Domain Generalization, CNNs, Transformers and Beyond

arxiv url: http://arxiv.org/abs/2201.03176v1
Date: Mon, 10 Jan 2022 06:00:26 GMT
ステータス: 翻訳完了
システム内更新日: 2022-01-11 22:35:53.801640
Title: Pedestrian Detection: Domain Generalization, CNNs, Transformers and Beyond
Title（参考訳）: 歩行者検出:ドメインの一般化、cnn、トランスフォーマーなど
Authors: Irtiza Hasan, Shengcai Liao, Jinpeng Li, Saad Ullah Akram, and Ling Shao
Abstract要約: その結果、現在の歩行者検知器は、クロスデータセット評価において、たとえ小さな領域シフトであっても処理が不十分であることがわかった。限定的な一般化は、その方法と現在のデータ源の2つの主要な要因に帰着する。本稿では、一般化を改善する進歩的な微調整戦略を提案する。
参考スコア（独自算出の注目度）: 82.37430109152383
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pedestrian detection is the cornerstone of many vision based applications, starting from object tracking to video surveillance and more recently, autonomous driving. With the rapid development of deep learning in object detection, pedestrian detection has achieved very good performance in traditional single-dataset training and evaluation setting. However, in this study on generalizable pedestrian detectors, we show that, current pedestrian detectors poorly handle even small domain shifts in cross-dataset evaluation. We attribute the limited generalization to two main factors, the method and the current sources of data. Regarding the method, we illustrate that biasness present in the design choices (e.g anchor settings) of current pedestrian detectors are the main contributing factor to the limited generalization. Most modern pedestrian detectors are tailored towards target dataset, where they do achieve high performance in traditional single training and testing pipeline, but suffer a degrade in performance when evaluated through cross-dataset evaluation. Consequently, a general object detector performs better in cross-dataset evaluation compared with state of the art pedestrian detectors, due to its generic design. As for the data, we show that the autonomous driving benchmarks are monotonous in nature, that is, they are not diverse in scenarios and dense in pedestrians. Therefore, benchmarks curated by crawling the web (which contain diverse and dense scenarios), are an efficient source of pre-training for providing a more robust representation. Accordingly, we propose a progressive fine-tuning strategy which improves generalization. Code and models cab accessed at https://github.com/hasanirtiza/Pedestron.
Abstract（参考訳）: 歩行者検出は、物体追跡からビデオ監視、そして最近では自動運転まで、多くのビジョンベースのアプリケーションの基礎となる。オブジェクト検出におけるディープラーニングの急速な発展により、歩行者検出は従来の単一データセットのトレーニングと評価設定において非常に優れたパフォーマンスを達成している。しかし, 一般化可能な歩行者検知器に関する本研究では, 現行の歩行者検知器は, クロスデータセット評価において, 小さい領域シフトでさえも扱いにくいことが示されている。限定的な一般化は、その方法と現在のデータ源の2つの主要な要因に帰着する。本手法では,現在の歩行者検知器の設計選択(例えばアンカー設定)に存在するバイアスが,限定的な一般化の主要な要因であることを示す。現代の歩行者検出装置は、従来の単一トレーニングとテストパイプラインで高いパフォーマンスを達成するためにターゲットデータセットに調整されているが、クロスデータセットの評価によって性能が低下している。その結果、汎用オブジェクト検出器は、その汎用設計のため、アート歩行者検出器の状態と比較して、クロスデータセット評価が優れている。データについては、自律走行ベンチマークは本質的に単調であり、シナリオでは多様ではなく、歩行者では密集していることを示している。したがって、webをクロールする(多様で密集したシナリオを含む)ベンチマークは、より堅牢な表現を提供するための事前トレーニングの効率的なソースである。そこで本研究では,一般化を向上するプログレッシブ微調整戦略を提案する。コードとモデルはhttps://github.com/hasanirtiza/pedestronからアクセスする。

論文の概要: Pedestrian Detection: Domain Generalization, CNNs, Transformers and Beyond

関連論文リスト