Towards Efficient Use of Multi-Scale Features in Transformer-Based
Object Detectors
- URL: http://arxiv.org/abs/2208.11356v2
- Date: Fri, 24 Mar 2023 02:06:36 GMT
- Title: Towards Efficient Use of Multi-Scale Features in Transformer-Based
Object Detectors
- Authors: Gongjie Zhang, Zhipeng Luo, Zichen Tian, Jingyi Zhang, Xiaoqin Zhang,
Shijian Lu
- Abstract summary: Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs.
We propose Iterative Multi-scale Feature Aggregation (IMFA) -- a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors.
- Score: 49.83396285177385
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-scale features have been proven highly effective for object detection
but often come with huge and even prohibitive extra computation costs,
especially for the recent Transformer-based detectors. In this paper, we
propose Iterative Multi-scale Feature Aggregation (IMFA) -- a generic paradigm
that enables efficient use of multi-scale features in Transformer-based object
detectors. The core idea is to exploit sparse multi-scale features from just a
few crucial locations, and it is achieved with two novel designs. First, IMFA
rearranges the Transformer encoder-decoder pipeline so that the encoded
features can be iteratively updated based on the detection predictions. Second,
IMFA sparsely samples scale-adaptive features for refined detection from just a
few keypoint locations under the guidance of prior detection predictions. As a
result, the sampled multi-scale features are sparse yet still highly beneficial
for object detection. Extensive experiments show that the proposed IMFA boosts
the performance of multiple Transformer-based object detectors significantly
yet with only slight computational overhead.
Related papers
- Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors.
First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning.
Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - Integral Migrating Pre-trained Transformer Encoder-decoders for Visual
Object Detection [78.2325219839805]
imTED improves the state-of-the-art of few-shot object detection by up to 7.6% AP.
Experiments on MS COCO dataset demonstrate that imTED consistently outperforms its counterparts by 2.8%.
arXiv Detail & Related papers (2022-05-19T15:11:20Z) - An Extendable, Efficient and Effective Transformer-based Object Detector [95.06044204961009]
We integrate Vision and Detection Transformers (ViDT) to construct an effective and efficient object detector.
ViDT introduces a reconfigured attention module to extend the recent Swin Transformer to be a standalone object detector.
We extend it to ViDT+ to support joint-task learning for object detection and instance segmentation.
arXiv Detail & Related papers (2022-04-17T09:27:45Z) - ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection.
vision transformers are the first fully transformer-based architecture for image classification.
In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z) - SRF-GAN: Super-Resolved Feature GAN for Multi-Scale Representation [5.634825161148483]
We propose a novel generator for super-resolving features of convolutional object detectors.
In this paper, we first design super-resolved feature GAN (SRF-GAN) consisting of a detection-based generator and a feature patch discriminator.
Our SRF generator can substitute for the traditional methods, and easily fine-tuned combined with other conventional detectors.
arXiv Detail & Related papers (2020-11-17T06:27:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.