Task Specific Attention is one more thing you need for object detection
- URL: http://arxiv.org/abs/2202.09048v1
- Date: Fri, 18 Feb 2022 07:09:33 GMT
- Title: Task Specific Attention is one more thing you need for object detection
- Authors: Sang Yon Lee
- Abstract summary: We propose that combining several attention modules with our new Task Specific Split Transformer(TSST) is a fairly good enough method to produce the best COCO results.
In this paper, we propose that combining several attention modules with our new Task Specific Split Transformer(TSST) is a fairly good enough method to produce the best COCO results.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Various models have been proposed to solve the object detection problem.
However, most of them require many hand-designed components to demonstrate good
performance. To mitigate these issues, Transformer based DETR and its variant
Deformable DETR were suggested. They solved much of the complex issue of
designing a head of object detection model but it has not been generally clear
that the Transformer-based models could be considered as the state-of-the-art
method in object detection without doubt. Furthermore, as DETR adapted
Transformer method only for the detection head, but still with including CNN
for the backbone body, it has not been certain that it would be possible to
build the competent end-to-end pipeline with the combination of attention
modules. In this paper, we propose that combining several attention modules
with our new Task Specific Split Transformer(TSST) is a fairly good enough
method to produce the best COCO results without traditionally hand-designed
components. By splitting generally purposed attention module into two separated
mission specific attention module, the proposed method addresses the way to
design simpler object detection models than before. Extensive experiments on
the COCO benchmark demonstrate the effectiveness of our approach. Code is
released at https://github.com/navervision/tsst
Related papers
- MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection [54.545054873239295]
Deepfakes have recently raised significant trust issues and security concerns among the public.
ViT-based methods take advantage of the expressivity of transformers, achieving superior detection performance.
This work introduces Mixture-of-Experts modules for Face Forgery Detection (MoE-FFD), a generalized yet parameter-efficient ViT-based approach.
arXiv Detail & Related papers (2024-04-12T13:02:08Z) - Contrastive Learning for Multi-Object Tracking with Transformers [79.61791059432558]
We show how DETR can be turned into a MOT model by employing an instance-level contrastive loss.
Our training scheme learns object appearances while preserving detection capabilities and with little overhead.
Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset.
arXiv Detail & Related papers (2023-11-14T10:07:52Z) - Transformer-based Multi-Instance Learning for Weakly Supervised Object
Detection [43.481591776038144]
Weakly Supervised Object Detection (WSOD) enables the training of object detection models using only image-level annotations.
We propose a novel backbone for WSOD based on our tailored Vision Transformer named Weakly Supervised Transformer Detection Network (WSTDN)
arXiv Detail & Related papers (2023-03-27T08:42:45Z) - Rethinking the Detection Head Configuration for Traffic Object Detection [11.526701794026641]
We propose a lightweight traffic object detection network based on matching between detection head and object distribution.
The proposed model achieves more competitive performance than other models on BDD100K dataset and our proposed ETFOD-v2 dataset.
arXiv Detail & Related papers (2022-10-08T02:23:57Z) - BatchFormerV2: Exploring Sample Relationships for Dense Representation
Learning [88.82371069668147]
BatchFormerV2 is a more general batch Transformer module, which enables exploring sample relationships for dense representation learning.
BatchFormerV2 consistently improves current DETR-based detection methods by over 1.3%.
arXiv Detail & Related papers (2022-04-04T05:53:42Z) - Plug-and-Play Few-shot Object Detection with Meta Strategy and Explicit
Localization Inference [78.41932738265345]
This paper proposes a plug detector that can accurately detect the objects of novel categories without fine-tuning process.
We introduce two explicit inferences into the localization process to reduce its dependence on annotated data.
It shows a significant lead in both efficiency, precision, and recall under varied evaluation protocols.
arXiv Detail & Related papers (2021-10-26T03:09:57Z) - Multi-patch Feature Pyramid Network for Weakly Supervised Object
Detection in Optical Remote Sensing Images [39.25541709228373]
We propose a new architecture for object detection with a multiple patch feature pyramid network (MPFP-Net)
MPFP-Net is different from the current models that during training only pursue the most discriminative patches.
We introduce an effective method to regularize the residual values and make the fusion transition layers strictly norm-preserving.
arXiv Detail & Related papers (2021-08-18T09:25:39Z) - Deformable DETR: Deformable Transformers for End-to-End Object Detection [41.050320861408046]
DETR suffers from slow convergence and limited feature spatial resolution.
We propose Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference.
Deformable DETR can achieve better performance than DETR with 10 times less training epochs.
arXiv Detail & Related papers (2020-10-08T17:59:21Z) - Condensing Two-stage Detection with Automatic Object Key Part Discovery [87.1034745775229]
Two-stage object detectors generally require excessively large models for their detection heads to achieve high accuracy.
We propose that the model parameters of two-stage detection heads can be condensed and reduced by concentrating on object key parts.
Our proposed technique consistently maintains original performance while waiving around 50% of the model parameters of common two-stage detection heads.
arXiv Detail & Related papers (2020-06-10T01:20:47Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.