Oriented Object Detection with Transformer
- URL: http://arxiv.org/abs/2106.03146v1
- Date: Sun, 6 Jun 2021 14:57:17 GMT
- Title: Oriented Object Detection with Transformer
- Authors: Teli Ma, Mingyuan Mao, Honghui Zheng, Peng Gao, Xiaodi Wang, Shumin
Han, Errui Ding, Baochang Zhang, David Doermann
- Abstract summary: We implement Oriented Object DEtection with TRansformer ($bf O2DETR$) based on an end-to-end network.
We design a simple but highly efficient encoder for Transformer by replacing the attention mechanism with depthwise separable convolution.
Our $rm O2DETR$ can be another new benchmark in the field of oriented object detection, which achieves up to 3.85 mAP improvement over Faster R-CNN and RetinaNet.
- Score: 51.634913687632604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object detection with Transformers (DETR) has achieved a competitive
performance over traditional detectors, such as Faster R-CNN. However, the
potential of DETR remains largely unexplored for the more challenging task of
arbitrary-oriented object detection problem. We provide the first attempt and
implement Oriented Object DEtection with TRansformer ($\bf O^2DETR$) based on
an end-to-end network. The contributions of $\rm O^2DETR$ include: 1) we
provide a new insight into oriented object detection, by applying Transformer
to directly and efficiently localize objects without a tedious process of
rotated anchors as in conventional detectors; 2) we design a simple but highly
efficient encoder for Transformer by replacing the attention mechanism with
depthwise separable convolution, which can significantly reduce the memory and
computational cost of using multi-scale features in the original Transformer;
3) our $\rm O^2DETR$ can be another new benchmark in the field of oriented
object detection, which achieves up to 3.85 mAP improvement over Faster R-CNN
and RetinaNet. We simply fine-tune the head mounted on $\rm O^2DETR$ in a
cascaded architecture and achieve a competitive performance over SOTA in the
DOTA dataset.
Related papers
- Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors.
First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning.
Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection.
vision transformers are the first fully transformer-based architecture for image classification.
In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z) - Efficient DETR: Improving End-to-End Object Detector with Dense Prior [7.348184873564071]
We propose Efficient DETR, a simple and efficient pipeline for end-to-end object detection.
By taking advantage of both dense detection and sparse set detection, Efficient DETR leverages dense prior to initialize the object containers.
Experiments conducted on MS COCO show that our method, with only 3 encoder layers and 1 decoder layer, achieves competitive performance with state-of-the-art object detection methods.
arXiv Detail & Related papers (2021-04-03T06:14:24Z) - DA-DETR: Domain Adaptive Detection Transformer with Information Fusion [53.25930448542148]
DA-DETR is a domain adaptive object detection transformer that introduces information fusion for effective transfer from a labeled source domain to an unlabeled target domain.
We introduce a novel CNN-Transformer Blender (CTBlender) that fuses the CNN features and Transformer features ingeniously for effective feature alignment and knowledge transfer across domains.
CTBlender employs the Transformer features to modulate the CNN features across multiple scales where the high-level semantic information and the low-level spatial information are fused for accurate object identification and localization.
arXiv Detail & Related papers (2021-03-31T13:55:56Z) - End-to-End Object Detection with Adaptive Clustering Transformer [37.9114488933667]
A novel variant of Transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input.
ACT cluster the query features adaptively using Locality Sensitive Hashing (LSH) and ap-proximate the query-key interaction.
Code is released as supplementary for the ease of experiment replication and verification.
arXiv Detail & Related papers (2020-11-18T14:36:37Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.