Fugu-MT 論文翻訳(概要): Efficient Decoder-free Object Detection with Transformers

論文の概要: Efficient Decoder-free Object Detection with Transformers

arxiv url: http://arxiv.org/abs/2206.06829v3
Date: Thu, 16 Jun 2022 01:53:08 GMT
ステータス: 翻訳完了
システム内更新日: 2022-06-17 11:50:16.170895
Title: Efficient Decoder-free Object Detection with Transformers
Title（参考訳）: 変圧器を用いた高効率デコーダフリー物体検出
Authors: Peixian Chen, Mengdan Zhang, Yunhang Shen, Kekai Sheng, Yuting Gao, Xing Sun, Ke Li, Chunhua Shen
Abstract要約: 視覚変換器(ViT)は、物体検出アプローチのランドスケープを変化させている。本稿では,デコーダフリー完全トランス(DFFT)オブジェクト検出器を提案する。 DFFT_SMALLは、トレーニングおよび推論段階で高い効率を達成する。
参考スコア（独自算出の注目度）: 75.00499377197475
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Vision transformers (ViTs) are changing the landscape of object detection approaches. A natural usage of ViTs in detection is to replace the CNN-based backbone with a transformer-based backbone, which is straightforward and effective, with the price of bringing considerable computation burden for inference. More subtle usage is the DETR family, which eliminates the need for many hand-designed components in object detection but introduces a decoder demanding an extra-long time to converge. As a result, transformer-based object detection can not prevail in large-scale applications. To overcome these issues, we propose a novel decoder-free fully transformer-based (DFFT) object detector, achieving high efficiency in both training and inference stages, for the first time. We simplify objection detection into an encoder-only single-level anchor-based dense prediction problem by centering around two entry points: 1) Eliminate the training-inefficient decoder and leverage two strong encoders to preserve the accuracy of single-level feature map prediction; 2) Explore low-level semantic features for the detection task with limited computational resources. In particular, we design a novel lightweight detection-oriented transformer backbone that efficiently captures low-level features with rich semantics based on a well-conceived ablation study. Extensive experiments on the MS COCO benchmark demonstrate that DFFT_SMALL outperforms DETR by 2.5% AP with 28% computation cost reduction and more than $10$x fewer training epochs. Compared with the cutting-edge anchor-based detector RetinaNet, DFFT_SMALL obtains over 5.5% AP gain while cutting down 70% computation cost.
Abstract（参考訳）: 視覚トランスフォーマー(vits)は、オブジェクト検出アプローチの展望を変えつつある。検出におけるViTの自然な利用は、CNNベースのバックボーンをトランスフォーマーベースのバックボーンに置き換えることである。より微妙な用途はDETRファミリであり、オブジェクト検出において多くの手設計のコンポーネントを必要としないが、収束するのに余分な時間を要するデコーダを導入する。その結果、大規模なアプリケーションではトランスフォーマーベースのオブジェクト検出が利用できない。これらの課題を克服するために, 初めて高い効率を達成する新しいデコーダフリー完全トランスフォーマー(dfft)オブジェクト検出器を提案する。 2つのエントリポイントを中心にして、エンコーダのみのシングルレベルアンカーに基づく密集予測問題に対する異論検出を単純化する。 1) トレーニング非効率デコーダを取り除き, 2つの強いエンコーダを活用して,シングルレベル特徴マップ予測の精度を維持すること。 2) 限られた計算資源で検出タスクの低レベルの意味的特徴を探索する。特に,低レベル特徴をリッチなセマンティクスで効率的にキャプチャする軽量な検出指向のトランスフォーマーバックボーンの設計を行った。 MS COCOベンチマークの大規模な実験により、DFFT_SMALLはDeTRを2.5%向上させ、28%の計算コスト削減と10ドル以上のトレーニングエポックを減らした。最先端のアンカーベースの検出器RetinaNetと比較して、DFFT_SMALLは計算コストを70%削減しながら5.5%以上のAPゲインを得る。

論文の概要: Efficient Decoder-free Object Detection with Transformers

関連論文リスト