Guiding Query Position and Performing Similar Attention for
Transformer-Based Detection Heads
- URL: http://arxiv.org/abs/2108.09691v1
- Date: Sun, 22 Aug 2021 11:32:34 GMT
- Title: Guiding Query Position and Performing Similar Attention for
Transformer-Based Detection Heads
- Authors: Xiaohu Jiang and Ze Chen and Zhicheng Wang and Erjin Zhou and ChunYuan
- Abstract summary: We propose the Guided Query Position (GQPos) method to embed the latest location information of object queries to query position iteratively.
Besides the feature maps is fused, SiA also fuse the attention weights maps to accelerate the learning of high-resolution attention weight map.
Our experiments show that the proposed GQPos improves the performance of a series of models, including DETR, SMCA, YoloS, and HoiTransformer.
- Score: 20.759022922347697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: After DETR was proposed, this novel transformer-based detection paradigm
which performs several cross-attentions between object queries and feature maps
for predictions has subsequently derived a series of transformer-based
detection heads. These models iterate object queries after each
cross-attention. However, they don't renew the query position which indicates
object queries' position information. Thus model needs extra learning to figure
out the newest regions that query position should express and need more
attention. To fix this issue, we propose the Guided Query Position (GQPos)
method to embed the latest location information of object queries to query
position iteratively.
Another problem of such transformer-based detection heads is the high
complexity to perform attention on multi-scale feature maps, which hinders them
from improving detection performance at all scales. Therefore we propose a
novel fusion scheme named Similar Attention (SiA): besides the feature maps is
fused, SiA also fuse the attention weights maps to accelerate the learning of
high-resolution attention weight map by well-learned low-resolution attention
weight map.
Our experiments show that the proposed GQPos improves the performance of a
series of models, including DETR, SMCA, YoloS, and HoiTransformer and SiA
consistently improve the performance of multi-scale transformer-based detection
heads like DETR and HoiTransformer.
Related papers
- Show Me What and Where has Changed? Question Answering and Grounding for Remote Sensing Change Detection [82.65760006883248]
We introduce a new task named Change Detection Question Answering and Grounding (CDQAG)
CDQAG extends the traditional change detection task by providing interpretable textual answers and intuitive visual evidence.
We construct the first CDQAG benchmark dataset, termed QAG-360K, comprising over 360K triplets of questions, textual answers, and corresponding high-quality visual masks.
arXiv Detail & Related papers (2024-10-31T11:20:13Z) - OrientedFormer: An End-to-End Transformer-Based Oriented Object Detector in Remote Sensing Images [26.37802649901314]
Oriented object detection in remote sensing images is a challenging task due to objects being distributed in multi-orientation.
We propose an end-to-end transformer-based oriented object detector consisting of three dedicated modules to address these issues.
Compared with previous end-to-end detectors, the OrientedFormer gains 1.16 and 1.21 AP$_50$ on DIOR-R and DOTA-v1.0 respectively, while reducing training epochs from 3$times$ to 1$times$.
arXiv Detail & Related papers (2024-09-29T10:36:33Z) - SEED: A Simple and Effective 3D DETR in Point Clouds [72.74016394325675]
We argue that the main challenges are challenging due to the high sparsity and uneven distribution of point clouds.
We propose a simple and effective 3D DETR method (SEED) for detecting 3D objects from point clouds.
arXiv Detail & Related papers (2024-07-15T14:21:07Z) - Decoupled DETR: Spatially Disentangling Localization and Classification
for Improved End-to-End Object Detection [48.429555904690595]
We introduce spatially decoupled DETR, which includes a task-aware query generation module and a disentangled feature learning process.
We demonstrate that our approach achieves a significant improvement in MSCOCO datasets compared to previous work.
arXiv Detail & Related papers (2023-10-24T15:54:11Z) - InsMapper: Exploring Inner-instance Information for Vectorized HD
Mapping [41.59891369655983]
InsMapper harnesses inner-instance information for vectorized high-definition mapping through transformers.
InsMapper surpasses the previous state-of-the-art method, demonstrating its effectiveness and generality.
arXiv Detail & Related papers (2023-08-16T17:58:28Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - Object Detection with Transformers: A Review [11.255962936937744]
This paper provides a comprehensive review of 21 recently proposed advancements in the original DETR model.
We conduct a comparative analysis across various detection transformers, evaluating their performance and network architectures.
We hope that this study will ignite further interest among researchers in addressing the existing challenges and exploring the application of transformers in the object detection domain.
arXiv Detail & Related papers (2023-06-07T16:13:38Z) - Time-rEversed diffusioN tEnsor Transformer: A new TENET of Few-Shot
Object Detection [35.54153749138406]
We propose a Time-rEversed diffusioN tEnsor Transformer (TENET) that captures multi-way feature occurrences that are highly discriminative.
We also propose a Transformer Relation Head (TRH) equipped with higher-order representations, which encodes correlations between query regions and the entire support set.
Our model achieves state-of-the-art results on PASCAL VOC, FSOD, and COCO.
arXiv Detail & Related papers (2022-10-30T17:40:12Z) - Transformers for Object Detection in Large Point Clouds [9.287964414592826]
We present TransLPC, a novel detection model for large point clouds based on a transformer architecture.
We propose a novel query refinement technique to improve detection accuracy, while retaining a memory-friendly number of transformer decoder queries.
This simple technique has a significant effect on detection accuracy, which is evaluated on the challenging nuScenes dataset on real-world lidar data.
arXiv Detail & Related papers (2022-09-30T06:35:43Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z) - Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation [90.28365183660438]
This paper proposes an augmented parallel-pyramid net with attention partial module and differentiable auto-data augmentation.
We define a new pose search space where the sequences of data augmentations are formulated as a trainable and operational CNN component.
Notably, our method achieves the top-1 accuracy on the challenging COCO keypoint benchmark and the state-of-the-art results on the MPII datasets.
arXiv Detail & Related papers (2020-03-17T03:52:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.