Decoupled DETR: Spatially Disentangling Localization and Classification
for Improved End-to-End Object Detection
- URL: http://arxiv.org/abs/2310.15955v1
- Date: Tue, 24 Oct 2023 15:54:11 GMT
- Title: Decoupled DETR: Spatially Disentangling Localization and Classification
for Improved End-to-End Object Detection
- Authors: Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li
- Abstract summary: We introduce spatially decoupled DETR, which includes a task-aware query generation module and a disentangled feature learning process.
We demonstrate that our approach achieves a significant improvement in MSCOCO datasets compared to previous work.
- Score: 48.429555904690595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The introduction of DETR represents a new paradigm for object detection.
However, its decoder conducts classification and box localization using shared
queries and cross-attention layers, leading to suboptimal results. We observe
that different regions of interest in the visual feature map are suitable for
performing query classification and box localization tasks, even for the same
object. Salient regions provide vital information for classification, while the
boundaries around them are more favorable for box regression. Unfortunately,
such spatial misalignment between these two tasks greatly hinders DETR's
training. Therefore, in this work, we focus on decoupling localization and
classification tasks in DETR. To achieve this, we introduce a new design scheme
called spatially decoupled DETR (SD-DETR), which includes a task-aware query
generation module and a disentangled feature learning process. We elaborately
design the task-aware query initialization process and divide the
cross-attention block in the decoder to allow the task-aware queries to match
different visual regions. Meanwhile, we also observe that the prediction
misalignment problem for high classification confidence and precise
localization exists, so we propose an alignment loss to further guide the
spatially decoupled DETR training. Through extensive experiments, we
demonstrate that our approach achieves a significant improvement in MSCOCO
datasets compared to previous work. For instance, we improve the performance of
Conditional DETR by 4.5 AP. By spatially disentangling the two tasks, our
method overcomes the misalignment problem and greatly improves the performance
of DETR for object detection.
Related papers
- Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - Task-Specific Context Decoupling for Object Detection [27.078743716924752]
Exsiting methods usually leverage disentangled heads to learn different feature context for each task.
We propose a novel Task-Specific COntext DEcoupling (TSCODE) head which further disentangles the feature encoding for two tasks.
Our method stably improves different detectors by over 1.0 AP with less computational cost.
arXiv Detail & Related papers (2023-03-02T08:02:14Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries.
We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations.
Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z) - Task-specific Inconsistency Alignment for Domain Adaptive Object
Detection [38.027790951157705]
Detectors trained with massive labeled data often exhibit dramatic performance degradation in certain scenarios with data distribution gap.
We propose Task-specific Inconsistency Alignment (TIA), by developing a new alignment mechanism in separate task spaces.
TIA demonstrates superior results on various scenarios to the previous state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T08:36:33Z) - Salient Object Ranking with Position-Preserved Attention [44.94722064885407]
We study the Salient Object Ranking (SOR) task, which manages to assign a ranking order of each detected object according to its visual saliency.
We propose the first end-to-end framework of the SOR task and solve it in a multi-task learning fashion.
We also introduce a Position-Preserved Attention (PPA) module tailored for the SOR branch.
arXiv Detail & Related papers (2021-06-09T13:00:05Z) - Learning to Relate Depth and Semantics for Unsupervised Domain
Adaptation [87.1188556802942]
We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting.
We propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions.
Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain.
arXiv Detail & Related papers (2021-05-17T13:42:09Z) - Modulating Localization and Classification for Harmonized Object
Detection [40.82723262074911]
We propose a mutual learning framework to modulate the two tasks.
In particular, the two tasks are forced to learn from each other with a novel mutual labeling strategy.
We achieve a significant performance gain over the baseline detectors on the COCO dataset.
arXiv Detail & Related papers (2021-03-16T10:36:02Z) - Pairwise Similarity Knowledge Transfer for Weakly Supervised Object
Localization [53.99850033746663]
We study the problem of learning localization model on target classes with weakly supervised image labels.
In this work, we argue that learning only an objectness function is a weak form of knowledge transfer.
Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function.
arXiv Detail & Related papers (2020-03-18T17:53:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.