Deep Equilibrium Object Detection
- URL: http://arxiv.org/abs/2308.09564v1
- Date: Fri, 18 Aug 2023 13:56:03 GMT
- Title: Deep Equilibrium Object Detection
- Authors: Shuai Wang, Yao Teng, Limin Wang
- Abstract summary: We present a new query-based object detector (DEQDet) by designing a deep equilibrium decoder.
Our experiments demonstrate DEQDet converges faster, consumes less memory, and achieves better results than the baseline counterpart.
- Score: 24.69829309391189
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Query-based object detectors directly decode image features into object
instances with a set of learnable queries. These query vectors are
progressively refined to stable meaningful representations through a sequence
of decoder layers, and then used to directly predict object locations and
categories with simple FFN heads. In this paper, we present a new query-based
object detector (DEQDet) by designing a deep equilibrium decoder. Our DEQ
decoder models the query vector refinement as the fixed point solving of an
{implicit} layer and is equivalent to applying {infinite} steps of refinement.
To be more specific to object decoding, we use a two-step unrolled equilibrium
equation to explicitly capture the query vector refinement. Accordingly, we are
able to incorporate refinement awareness into the DEQ training with the inexact
gradient back-propagation (RAG). In addition, to stabilize the training of our
DEQDet and improve its generalization ability, we devise the deep supervision
scheme on the optimization path of DEQ with refinement-aware
perturbation~(RAP). Our experiments demonstrate DEQDet converges faster,
consumes less memory, and achieves better results than the baseline counterpart
(AdaMixer). In particular, our DEQDet with ResNet50 backbone and 300 queries
achieves the $49.5$ mAP and $33.0$ AP$_s$ on the MS COCO benchmark under
$2\times$ training scheme (24 epochs).
Related papers
- A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks [81.2624272756733]
In dense retrieval, deep encoders provide embeddings for both inputs and targets.
We train a small parametric corrector network that adjusts stale cached target embeddings.
Our approach matches state-of-the-art results even when no target embedding updates are made during training.
arXiv Detail & Related papers (2024-09-03T13:29:13Z) - Ranking-based Adaptive Query Generation for DETRs in Crowded Pedestrian
Detection [49.27380156754935]
We find that the number of DETRs' queries must be adjusted manually, otherwise, the performance would degrade to varying degrees.
We propose Rank-based Adaptive Query Generation (RAQG) to alleviate the problem.
Our method is simple and effective, which can be plugged into any DETRs to make it query-adaptive in theory.
arXiv Detail & Related papers (2023-10-24T11:00:56Z) - Rank-DETR for High Quality Object Detection [52.82810762221516]
A highly performant object detector requires accurate ranking for the bounding box predictions.
In this work, we introduce a simple and highly performant DETR-based object detector by proposing a series of rank-oriented designs.
arXiv Detail & Related papers (2023-10-13T04:48:32Z) - V-DETR: DETR with Vertex Relative Position Encoding for 3D Object
Detection [73.37781484123536]
We introduce a highly performant 3D object detector for point clouds using the DETR framework.
To address the limitation, we introduce a novel 3D Relative Position (3DV-RPE) method.
We show exceptional results on the challenging ScanNetV2 benchmark.
arXiv Detail & Related papers (2023-08-08T17:14:14Z) - Q-DETR: An Efficient Low-Bit Quantized Detection Transformer [50.00784028552792]
We find that the bottlenecks of Q-DETR come from the query information distortion through our empirical analyses.
We formulate our DRD as a bi-level optimization problem, which can be derived by generalizing the information bottleneck (IB) principle to the learning of Q-DETR.
We introduce a new foreground-aware query matching scheme to effectively transfer the teacher information to distillation-desired features to minimize the conditional information entropy.
arXiv Detail & Related papers (2023-04-01T08:05:14Z) - D2Q-DETR: Decoupling and Dynamic Queries for Oriented Object Detection
with Transformers [14.488821968433834]
We propose an end-to-end framework for oriented object detection.
Our framework is based on DETR, with the box regression head replaced with a points prediction head.
Experiments on the largest and challenging DOTA-v1.0 and DOTA-v1.5 datasets show that D2Q-DETR outperforms existing NMS-based and NMS-free oriented object detection methods.
arXiv Detail & Related papers (2023-03-01T14:36:19Z) - Enhanced Training of Query-Based Object Detection via Selective Query
Recollection [35.3219210570517]
This paper investigates a phenomenon where query-based object detectors mispredict at the last decoding stage while predicting correctly at an intermediate stage.
We design and present Selective Query Recollection, a simple and effective training strategy for query-based object detectors.
arXiv Detail & Related papers (2022-12-15T02:45:57Z) - Learning Low-Rank Representations for Model Compression [6.721845345130468]
We propose a Low-Rank Representation Vector Quantization ($textLR2textVQ$) method that outperforms previous VQ algorithms in various tasks and architectures.
In our method, the compression ratio could be directly controlled by $m$, and the final accuracy is solely determined by $tilded$.
With a proper $tilded$, we evaluate $textLR2textVQ$ with ResNet-18/ResNet-50 on ImageNet classification datasets, achieving 2.8%/1.0% top
arXiv Detail & Related papers (2022-11-21T12:15:28Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.