Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
- URL: http://arxiv.org/abs/2407.11699v1
- Date: Tue, 16 Jul 2024 13:17:07 GMT
- Title: Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
- Authors: Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen, Xuguang Lan,
- Abstract summary: We present a scheme for enhancing the convergence and performance of DETR (DEtection TRansformer)
Our approach, termed Relation-DETR, introduces an encoder to construct position relation embeddings for progressive attention refinement.
Experiments on both generic and task-specific datasets demonstrate the effectiveness of our approach.
- Score: 26.03892270020559
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a general scheme for enhancing the convergence and performance of DETR (DEtection TRansformer). We investigate the slow convergence problem in transformers from a new perspective, suggesting that it arises from the self-attention that introduces no structural bias over inputs. To address this issue, we explore incorporating position relation prior as attention bias to augment object detection, following the verification of its statistical significance using a proposed quantitative macroscopic correlation (MC) metric. Our approach, termed Relation-DETR, introduces an encoder to construct position relation embeddings for progressive attention refinement, which further extends the traditional streaming pipeline of DETR into a contrastive relation pipeline to address the conflicts between non-duplicate predictions and positive supervision. Extensive experiments on both generic and task-specific datasets demonstrate the effectiveness of our approach. Under the same configurations, Relation-DETR achieves a significant improvement (+2.0% AP compared to DINO), state-of-the-art performance (51.7% AP for 1x and 52.1% AP for 2x settings), and a remarkably faster convergence speed (over 40% AP with only 2 training epochs) than existing DETR detectors on COCO val2017. Moreover, the proposed relation encoder serves as a universal plug-in-and-play component, bringing clear improvements for theoretically any DETR-like methods. Furthermore, we introduce a class-agnostic detection dataset, SA-Det-100k. The experimental results on the dataset illustrate that the proposed explicit position relation achieves a clear improvement of 1.3% AP, highlighting its potential towards universal object detection. The code and dataset are available at https://github.com/xiuqhou/Relation-DETR.
Related papers
- Improving Detection in Aerial Images by Capturing Inter-Object Relationships [7.729994373861261]
spatial distribution of objects in a scene exhibits meaningful patterns governed by their semantic relationships.
In this work, we introduce a transformer-based approach to capture these inter-object relationships.
Specific spatial and geometric relations are incorporated into the attention weights and adaptively modulated and regularized.
arXiv Detail & Related papers (2024-04-05T14:39:13Z) - RAT: Retrieval-Augmented Transformer for Click-Through Rate Prediction [68.34355552090103]
This paper develops a Retrieval-Augmented Transformer (RAT), aiming to acquire fine-grained feature interactions within and across samples.
We then build Transformer layers with cascaded attention to capture both intra- and cross-sample feature interactions.
Experiments on real-world datasets substantiate the effectiveness of RAT and suggest its advantage in long-tail scenarios.
arXiv Detail & Related papers (2024-04-02T19:14:23Z) - PETDet: Proposal Enhancement for Two-Stage Fine-Grained Object Detection [26.843891792018447]
We present PETDet (Proposal Enhancement for Two-stage fine-grained object detection) to better handle the sub-tasks in two-stage FGOD methods.
An anchor-free Quality Oriented Proposal Network (QOPN) is proposed with dynamic label assignment and attention-based decomposition.
A novel Adaptive Recognition Loss (ARL) offers guidance for the R-CNN head to focus on high-quality proposals.
arXiv Detail & Related papers (2023-12-16T18:04:56Z) - DRAG: Divergence-based Adaptive Aggregation in Federated learning on
Non-IID Data [11.830891255837788]
Local gradient descent (SGD) is a fundamental approach in achieving communication efficiency in Federated Learning (FL)
We introduce a novel metric called degree of divergence," quantifying the angle between the local gradient and the global reference direction.
We propose the divergence-based adaptive aggregation (DRAG) algorithm, which dynamically drags" the received local updates toward the reference direction in each round without requiring extra communication overhead.
arXiv Detail & Related papers (2023-09-04T19:40:58Z) - Focus the Discrepancy: Intra- and Inter-Correlation Learning for Image
Anomaly Detection [13.801572236048601]
FOcus-the-Discrepancy (FOD) can simultaneously spot the patch-wise, intra- and inter-discrepancies of anomalies.
In this paper, we propose a novel AD framework: FOcus-the-Discrepancy (FOD), which can simultaneously spot the patch-wise, intra- and inter-discrepancies of anomalies.
arXiv Detail & Related papers (2023-08-06T01:30:26Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - Hausdorff Distance Matching with Adaptive Query Denoising for Rotated Detection Transformer [4.137346786534721]
We introduce a Hausdorff distance-based cost for bipartite matching, which more accurately quantifies the discrepancy between predictions and ground truths.
We propose an adaptive query denoising method that employs bipartite matching to selectively eliminate noised queries that detract from model improvement.
arXiv Detail & Related papers (2023-05-12T16:42:54Z) - Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale
Feature Fusion [95.7732308775325]
The proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection.
DETR suffers from slow training convergence, which hinders its applicability to various detection tasks.
We design Semantic-Aligned-Matching DETR++ to accelerate DETR's convergence and improve detection performance.
arXiv Detail & Related papers (2022-07-28T15:34:29Z) - Recurrent Glimpse-based Decoder for Detection with Transformer [85.64521612986456]
We introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper.
In particular, the REGO employs a multi-stage recurrent processing structure to help the attention of DETR gradually focus on foreground objects.
REGO consistently boosts the performance of different DETR detectors by up to 7% relative gain at the same setting of 50 training epochs.
arXiv Detail & Related papers (2021-12-09T00:29:19Z) - Robust and Accurate Object Detection via Adversarial Learning [111.36192453882195]
This work augments the fine-tuning stage for object detectors by exploring adversarial examples.
Our approach boosts the performance of state-of-the-art EfficientDets by +1.1 mAP on the object detection benchmark.
arXiv Detail & Related papers (2021-03-23T19:45:26Z) - Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation [90.28365183660438]
This paper proposes an augmented parallel-pyramid net with attention partial module and differentiable auto-data augmentation.
We define a new pose search space where the sequences of data augmentations are formulated as a trainable and operational CNN component.
Notably, our method achieves the top-1 accuracy on the challenging COCO keypoint benchmark and the state-of-the-art results on the MPII datasets.
arXiv Detail & Related papers (2020-03-17T03:52:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.