End-to-End Object Detection with Adaptive Clustering Transformer
- URL: http://arxiv.org/abs/2011.09315v2
- Date: Mon, 18 Oct 2021 07:15:55 GMT
- Title: End-to-End Object Detection with Adaptive Clustering Transformer
- Authors: Minghang Zheng, Peng Gao, Renrui Zhang, Kunchang Li, Xiaogang Wang,
Hongsheng Li, Hao Dong
- Abstract summary: A novel variant of Transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input.
ACT cluster the query features adaptively using Locality Sensitive Hashing (LSH) and ap-proximate the query-key interaction.
Code is released as supplementary for the ease of experiment replication and verification.
- Score: 37.9114488933667
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: End-to-end Object Detection with Transformer (DETR)proposes to perform object
detection with Transformer and achieve comparable performance with two-stage
object detection like Faster-RCNN. However, DETR needs huge computational
resources for training and inference due to the high-resolution spatial input.
In this paper, a novel variant of transformer named Adaptive Clustering
Transformer(ACT) has been proposed to reduce the computation cost for
high-resolution input. ACT cluster the query features adaptively using Locality
Sensitive Hashing (LSH) and ap-proximate the query-key interaction using the
prototype-key interaction. ACT can reduce the quadratic O(N2) complexity inside
self-attention into O(NK) where K is the number of prototypes in each layer.
ACT can be a drop-in module replacing the original self-attention module
without any training. ACT achieves a good balance between accuracy and
computation cost (FLOPs). The code is available as supplementary for the ease
of experiment replication and verification. Code is released at
\url{https://github.com/gaopengcuhk/SMCA-DETR/}
Related papers
- ENACT: Entropy-based Clustering of Attention Input for Improving the Computational Performance of Object Detection Transformers [0.0]
Transformers demonstrate competitive performance in terms of precision on the problem of vision-based object detection.
We propose to cluster the transformer input on the basis of its entropy.
Clustering reduces the size of data given as input to the transformer and therefore reduces training time and GPU memory usage.
arXiv Detail & Related papers (2024-09-11T18:03:59Z) - Transformers for Object Detection in Large Point Clouds [9.287964414592826]
We present TransLPC, a novel detection model for large point clouds based on a transformer architecture.
We propose a novel query refinement technique to improve detection accuracy, while retaining a memory-friendly number of transformer decoder queries.
This simple technique has a significant effect on detection accuracy, which is evaluated on the challenging nuScenes dataset on real-world lidar data.
arXiv Detail & Related papers (2022-09-30T06:35:43Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - Oriented Object Detection with Transformer [51.634913687632604]
We implement Oriented Object DEtection with TRansformer ($bf O2DETR$) based on an end-to-end network.
We design a simple but highly efficient encoder for Transformer by replacing the attention mechanism with depthwise separable convolution.
Our $rm O2DETR$ can be another new benchmark in the field of oriented object detection, which achieves up to 3.85 mAP improvement over Faster R-CNN and RetinaNet.
arXiv Detail & Related papers (2021-06-06T14:57:17Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.