Recurrent Glimpse-based Decoder for Detection with Transformer
- URL: http://arxiv.org/abs/2112.04632v1
- Date: Thu, 9 Dec 2021 00:29:19 GMT
- Title: Recurrent Glimpse-based Decoder for Detection with Transformer
- Authors: Zhe Chen, Jing Zhang, Dacheng Tao
- Abstract summary: We introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper.
In particular, the REGO employs a multi-stage recurrent processing structure to help the attention of DETR gradually focus on foreground objects.
REGO consistently boosts the performance of different DETR detectors by up to 7% relative gain at the same setting of 50 training epochs.
- Score: 85.64521612986456
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Although detection with Transformer (DETR) is increasingly popular, its
global attention modeling requires an extremely long training period to
optimize and achieve promising detection performance. Alternative to existing
studies that mainly develop advanced feature or embedding designs to tackle the
training issue, we point out that the Region-of-Interest (RoI) based detection
refinement can easily help mitigate the difficulty of training for DETR
methods. Based on this, we introduce a novel REcurrent Glimpse-based decOder
(REGO) in this paper. In particular, the REGO employs a multi-stage recurrent
processing structure to help the attention of DETR gradually focus on
foreground objects more accurately. In each processing stage, visual features
are extracted as glimpse features from RoIs with enlarged bounding box areas of
detection results from the previous stage. Then, a glimpse-based decoder is
introduced to provide refined detection results based on both the glimpse
features and the attention modeling outputs of the previous stage. In practice,
REGO can be easily embedded in representative DETR variants while maintaining
their fully end-to-end training and inference pipelines. In particular, REGO
helps Deformable DETR achieve 44.8 AP on the MSCOCO dataset with only 36
training epochs, compared with the first DETR and the Deformable DETR that
require 500 and 50 epochs to achieve comparable performance, respectively.
Experiments also show that REGO consistently boosts the performance of
different DETR detectors by up to 7% relative gain at the same setting of 50
training epochs. Code is available via
https://github.com/zhechen/Deformable-DETR-REGO.
Related papers
- Relation DETR: Exploring Explicit Position Relation Prior for Object Detection [26.03892270020559]
We present a scheme for enhancing the convergence and performance of DETR (DEtection TRansformer)
Our approach, termed Relation-DETR, introduces an encoder to construct position relation embeddings for progressive attention refinement.
Experiments on both generic and task-specific datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-07-16T13:17:07Z) - DETR Doesn't Need Multi-Scale or Locality Design [69.56292005230185]
This paper presents an improved DETR detector that maintains a "plain" nature.
It uses a single-scale feature map and global cross-attention calculations without specific locality constraints.
We show that two simple technologies are surprisingly effective within a plain design to compensate for the lack of multi-scale feature maps and locality constraints.
arXiv Detail & Related papers (2023-08-03T17:59:04Z) - Revisiting DETR Pre-training for Object Detection [24.372444866927538]
We investigate the shortcomings of DETReg in enhancing the performance of robust DETR-based models under full data conditions.
We employ an optimized approach named Simple Self-training which leads to marked enhancements through the combination of an improved box predictor and the Objects$365$ benchmark.
The culmination of these endeavors results in a remarkable AP score of $59.3%$ on the COCO val set, outperforming $mathcalH$-Deformable-DETR + Swin-L without pre-training by $1.4%$.
arXiv Detail & Related papers (2023-08-02T17:39:30Z) - DEYO: DETR with YOLO for Step-by-Step Object Detection [0.0]
This paper proposes a new two-stage object detection model, named DETR with YOLO (DEYO)
The first stage provides high-quality query and anchor feeding into the second stage, improving the performance and efficiency of the second stage compared to the original DETR model.
Experiments demonstrate that DEYO attains 50.6 AP and 52.1 AP in 12 and 36 epochs, respectively.
arXiv Detail & Related papers (2022-11-12T06:36:17Z) - Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale
Feature Fusion [95.7732308775325]
The proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection.
DETR suffers from slow training convergence, which hinders its applicability to various detection tasks.
We design Semantic-Aligned-Matching DETR++ to accelerate DETR's convergence and improve detection performance.
arXiv Detail & Related papers (2022-07-28T15:34:29Z) - Fast Convergence of DETR with Spatially Modulated Co-Attention [83.19863907905666]
We propose a simple yet effective scheme for improving the Detection Transformer framework, namely Spatially Modulated Co-Attention (SMCA) mechanism.
Our proposed SMCA increases DETR's convergence speed by replacing the original co-attention mechanism in the decoder.
Our fully-fledged SMCA can achieve better performance compared to DETR with a dilated convolution-based backbone.
arXiv Detail & Related papers (2021-08-05T06:53:19Z) - Fast Convergence of DETR with Spatially Modulated Co-Attention [83.19863907905666]
We propose a simple yet effective scheme for improving the Detection Transformer framework, namely Spatially Modulated Co-Attention (SMCA) mechanism.
Our proposed SMCA increases DETR's convergence speed by replacing the original co-attention mechanism in the decoder.
Our fully-fledged SMCA can achieve better performance compared to DETR with a dilated convolution-based backbone.
arXiv Detail & Related papers (2021-01-19T03:52:44Z) - UP-DETR: Unsupervised Pre-training for Object Detection with
Transformers [11.251593386108189]
We propose a novel pretext task named random query patch detection in Unsupervised Pre-training DETR (UP-DETR)
Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder.
UP-DETR significantly boosts the performance of DETR with faster convergence and higher average precision on object detection, one-shot detection and panoptic segmentation.
arXiv Detail & Related papers (2020-11-18T05:16:11Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.