Guiding Query Position and Performing Similar Attention for
Transformer-Based Detection Heads
- URL: http://arxiv.org/abs/2108.09691v1
- Date: Sun, 22 Aug 2021 11:32:34 GMT
- Title: Guiding Query Position and Performing Similar Attention for
Transformer-Based Detection Heads
- Authors: Xiaohu Jiang and Ze Chen and Zhicheng Wang and Erjin Zhou and ChunYuan
- Abstract summary: We propose the Guided Query Position (GQPos) method to embed the latest location information of object queries to query position iteratively.
Besides the feature maps is fused, SiA also fuse the attention weights maps to accelerate the learning of high-resolution attention weight map.
Our experiments show that the proposed GQPos improves the performance of a series of models, including DETR, SMCA, YoloS, and HoiTransformer.
- Score: 20.759022922347697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: After DETR was proposed, this novel transformer-based detection paradigm
which performs several cross-attentions between object queries and feature maps
for predictions has subsequently derived a series of transformer-based
detection heads. These models iterate object queries after each
cross-attention. However, they don't renew the query position which indicates
object queries' position information. Thus model needs extra learning to figure
out the newest regions that query position should express and need more
attention. To fix this issue, we propose the Guided Query Position (GQPos)
method to embed the latest location information of object queries to query
position iteratively.
Another problem of such transformer-based detection heads is the high
complexity to perform attention on multi-scale feature maps, which hinders them
from improving detection performance at all scales. Therefore we propose a
novel fusion scheme named Similar Attention (SiA): besides the feature maps is
fused, SiA also fuse the attention weights maps to accelerate the learning of
high-resolution attention weight map by well-learned low-resolution attention
weight map.
Our experiments show that the proposed GQPos improves the performance of a
series of models, including DETR, SMCA, YoloS, and HoiTransformer and SiA
consistently improve the performance of multi-scale transformer-based detection
heads like DETR and HoiTransformer.
Related papers
- SEED: A Simple and Effective 3D DETR in Point Clouds [72.74016394325675]
We argue that the main challenges are challenging due to the high sparsity and uneven distribution of point clouds.
We propose a simple and effective 3D DETR method (SEED) for detecting 3D objects from point clouds.
arXiv Detail & Related papers (2024-07-15T14:21:07Z) - Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction [15.324464723174533]
This paper introduces MapQR, an end-to-end method with an emphasis on enhancing query capabilities for constructing online vectorized maps.
MapQR utilizes a novel query design, called scatter-and-gather query, which is modelled by separate content and position parts explicitly.
The proposed MapQR achieves the best mean average precision (mAP) and maintains good efficiency on both nuScenes and Argoverse 2.
arXiv Detail & Related papers (2024-02-27T11:43:09Z) - Decoupled DETR: Spatially Disentangling Localization and Classification
for Improved End-to-End Object Detection [48.429555904690595]
We introduce spatially decoupled DETR, which includes a task-aware query generation module and a disentangled feature learning process.
We demonstrate that our approach achieves a significant improvement in MSCOCO datasets compared to previous work.
arXiv Detail & Related papers (2023-10-24T15:54:11Z) - InsMapper: Exploring Inner-instance Information for Vectorized HD
Mapping [41.59891369655983]
InsMapper harnesses inner-instance information for vectorized high-definition mapping through transformers.
InsMapper surpasses the previous state-of-the-art method, demonstrating its effectiveness and generality.
arXiv Detail & Related papers (2023-08-16T17:58:28Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - Object Detection with Transformers: A Review [11.255962936937744]
This paper provides a comprehensive review of 21 recently proposed advancements in the original DETR model.
We conduct a comparative analysis across various detection transformers, evaluating their performance and network architectures.
We hope that this study will ignite further interest among researchers in addressing the existing challenges and exploring the application of transformers in the object detection domain.
arXiv Detail & Related papers (2023-06-07T16:13:38Z) - Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors.
First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning.
Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z) - Time-rEversed diffusioN tEnsor Transformer: A new TENET of Few-Shot
Object Detection [35.54153749138406]
We propose a Time-rEversed diffusioN tEnsor Transformer (TENET) that captures multi-way feature occurrences that are highly discriminative.
We also propose a Transformer Relation Head (TRH) equipped with higher-order representations, which encodes correlations between query regions and the entire support set.
Our model achieves state-of-the-art results on PASCAL VOC, FSOD, and COCO.
arXiv Detail & Related papers (2022-10-30T17:40:12Z) - Transformers for Object Detection in Large Point Clouds [9.287964414592826]
We present TransLPC, a novel detection model for large point clouds based on a transformer architecture.
We propose a novel query refinement technique to improve detection accuracy, while retaining a memory-friendly number of transformer decoder queries.
This simple technique has a significant effect on detection accuracy, which is evaluated on the challenging nuScenes dataset on real-world lidar data.
arXiv Detail & Related papers (2022-09-30T06:35:43Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z) - Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation [90.28365183660438]
This paper proposes an augmented parallel-pyramid net with attention partial module and differentiable auto-data augmentation.
We define a new pose search space where the sequences of data augmentations are formulated as a trainable and operational CNN component.
Notably, our method achieves the top-1 accuracy on the challenging COCO keypoint benchmark and the state-of-the-art results on the MPII datasets.
arXiv Detail & Related papers (2020-03-17T03:52:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.