Related papers: Oriented Object Detection with Transformer

Oriented Object Detection with Transformer

URL: http://arxiv.org/abs/2106.03146v1
Date: Sun, 6 Jun 2021 14:57:17 GMT
Title: Oriented Object Detection with Transformer
Authors: Teli Ma, Mingyuan Mao, Honghui Zheng, Peng Gao, Xiaodi Wang, Shumin Han, Errui Ding, Baochang Zhang, David Doermann
Abstract summary: We implement Oriented Object DEtection with TRansformer ($bf O2DETR$) based on an end-to-end network. We design a simple but highly efficient encoder for Transformer by replacing the attention mechanism with depthwise separable convolution. Our $rm O2DETR$ can be another new benchmark in the field of oriented object detection, which achieves up to 3.85 mAP improvement over Faster R-CNN and RetinaNet.
Score: 51.634913687632604
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN. However, the potential of DETR remains largely unexplored for the more challenging task of arbitrary-oriented object detection problem. We provide the first attempt and implement Oriented Object DEtection with TRansformer ($\bf O^2DETR$) based on an end-to-end network. The contributions of $\rm O^2DETR$ include: 1) we provide a new insight into oriented object detection, by applying Transformer to directly and efficiently localize objects without a tedious process of rotated anchors as in conventional detectors; 2) we design a simple but highly efficient encoder for Transformer by replacing the attention mechanism with depthwise separable convolution, which can significantly reduce the memory and computational cost of using multi-scale features in the original Transformer; 3) our $\rm O^2DETR$ can be another new benchmark in the field of oriented object detection, which achieves up to 3.85 mAP improvement over Faster R-CNN and RetinaNet. We simply fine-tune the head mounted on $\rm O^2DETR$ in a cascaded architecture and achieve a competitive performance over SOTA in the DOTA dataset.

Related papers

Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors. First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning. Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z)
Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches. We propose a decoder-free fully transformer-based (DFFT) object detector. DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z)
Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects. The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z)
ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection. vision transformers are the first fully transformer-based architecture for image classification. In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z)
Efficient DETR: Improving End-to-End Object Detector with Dense Prior [7.348184873564071]
We propose Efficient DETR, a simple and efficient pipeline for end-to-end object detection. By taking advantage of both dense detection and sparse set detection, Efficient DETR leverages dense prior to initialize the object containers. Experiments conducted on MS COCO show that our method, with only 3 encoder layers and 1 decoder layer, achieves competitive performance with state-of-the-art object detection methods.
arXiv Detail & Related papers (2021-04-03T06:14:24Z)
DA-DETR: Domain Adaptive Detection Transformer with Information Fusion [53.25930448542148]
DA-DETR is a domain adaptive object detection transformer that introduces information fusion for effective transfer from a labeled source domain to an unlabeled target domain. We introduce a novel CNN-Transformer Blender (CTBlender) that fuses the CNN features and Transformer features ingeniously for effective feature alignment and knowledge transfer across domains. CTBlender employs the Transformer features to modulate the CNN features across multiple scales where the high-level semantic information and the low-level spatial information are fused for accurate object identification and localization.
arXiv Detail & Related papers (2021-03-31T13:55:56Z)
End-to-End Object Detection with Adaptive Clustering Transformer [37.9114488933667]
A novel variant of Transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input. ACT cluster the query features adaptively using Locality Sensitive Hashing (LSH) and ap-proximate the query-key interaction. Code is released as supplementary for the ease of experiment replication and verification.
arXiv Detail & Related papers (2020-11-18T14:36:37Z)
End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.