Towards Light Weight Object Detection System
- URL: http://arxiv.org/abs/2210.03861v1
- Date: Sat, 8 Oct 2022 00:55:15 GMT
- Title: Towards Light Weight Object Detection System
- Authors: Dharma KC, Venkata Ravi Kiran Dayana, Meng-Lin Wu, Venkateswara Rao
Cherukuri, Hau Hwang
- Abstract summary: We present an approximation of the self-attention layers used in the transformer architecture.
We also present a method that uses a transformer encoder layer for multi-resolution feature fusion.
- Score: 6.535035773534901
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers are a popular choice for classification tasks and as backbones
for object detection tasks. However, their high latency brings challenges in
their adaptation to lightweight object detection systems. We present an
approximation of the self-attention layers used in the transformer
architecture. This approximation reduces the latency of the classification
system while incurring minimal loss in accuracy. We also present a method that
uses a transformer encoder layer for multi-resolution feature fusion. This
feature fusion improves the accuracy of the state-of-the-art lightweight object
detection system without significantly increasing the number of parameters.
Finally, we provide an abstraction for the transformer architecture called
Generalized Transformer (gFormer) that can guide the design of novel
transformer-like architectures.
Related papers
- SimPLR: A Simple and Plain Transformer for Scaling-Efficient Object Detection and Segmentation [49.65221743520028]
We show that a transformer-based detector with scale-aware attention enables the plain detector SimPLR' whose backbone and detection head are both non-hierarchical and operate on single-scale features.
Compared to the multi-scale and single-scale state-of-the-art, our model scales much better with bigger capacity (self-supervised) models and more pre-training data.
arXiv Detail & Related papers (2023-10-09T17:59:26Z) - Feature Shrinkage Pyramid for Camouflaged Object Detection with
Transformers [34.42710399235461]
Vision transformers have recently shown strong global context modeling capabilities in camouflaged object detection.
They suffer from two major limitations: less effective locality modeling and insufficient feature aggregation in decoders.
We propose a novel transformer-based Feature Shrinkage Pyramid Network (FSPNet), which aims to hierarchically decode locality-enhanced neighboring transformer features.
arXiv Detail & Related papers (2023-03-26T20:50:58Z) - Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors.
First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning.
Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z) - Transformers for Object Detection in Large Point Clouds [9.287964414592826]
We present TransLPC, a novel detection model for large point clouds based on a transformer architecture.
We propose a novel query refinement technique to improve detection accuracy, while retaining a memory-friendly number of transformer decoder queries.
This simple technique has a significant effect on detection accuracy, which is evaluated on the challenging nuScenes dataset on real-world lidar data.
arXiv Detail & Related papers (2022-09-30T06:35:43Z) - Towards Efficient Use of Multi-Scale Features in Transformer-Based
Object Detectors [49.83396285177385]
Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs.
We propose Iterative Multi-scale Feature Aggregation (IMFA) -- a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors.
arXiv Detail & Related papers (2022-08-24T08:09:25Z) - An Extendable, Efficient and Effective Transformer-based Object Detector [95.06044204961009]
We integrate Vision and Detection Transformers (ViDT) to construct an effective and efficient object detector.
ViDT introduces a reconfigured attention module to extend the recent Swin Transformer to be a standalone object detector.
We extend it to ViDT+ to support joint-task learning for object detection and instance segmentation.
arXiv Detail & Related papers (2022-04-17T09:27:45Z) - ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection.
vision transformers are the first fully transformer-based architecture for image classification.
In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z) - DA-DETR: Domain Adaptive Detection Transformer with Information Fusion [53.25930448542148]
DA-DETR is a domain adaptive object detection transformer that introduces information fusion for effective transfer from a labeled source domain to an unlabeled target domain.
We introduce a novel CNN-Transformer Blender (CTBlender) that fuses the CNN features and Transformer features ingeniously for effective feature alignment and knowledge transfer across domains.
CTBlender employs the Transformer features to modulate the CNN features across multiple scales where the high-level semantic information and the low-level spatial information are fused for accurate object identification and localization.
arXiv Detail & Related papers (2021-03-31T13:55:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.