Can the Query-based Object Detector Be Designed with Fewer Stages?
- URL: http://arxiv.org/abs/2309.16306v1
- Date: Thu, 28 Sep 2023 09:58:52 GMT
- Title: Can the Query-based Object Detector Be Designed with Fewer Stages?
- Authors: Jialin Li, Weifu Fu, Yuhuan Lin, Qiang Nie, Yong Liu
- Abstract summary: We propose a novel model called GOLO, which follows a two-stage decoding paradigm.
Compared to other mainstream query-based models with multi-stage decoders, our model employs fewer decoder stages while still achieving considerable performance.
- Score: 15.726619371300558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Query-based object detectors have made significant advancements since the
publication of DETR. However, most existing methods still rely on multi-stage
encoders and decoders, or a combination of both. Despite achieving high
accuracy, the multi-stage paradigm (typically consisting of 6 stages) suffers
from issues such as heavy computational burden, prompting us to reconsider its
necessity. In this paper, we explore multiple techniques to enhance query-based
detectors and, based on these findings, propose a novel model called GOLO
(Global Once and Local Once), which follows a two-stage decoding paradigm.
Compared to other mainstream query-based models with multi-stage decoders, our
model employs fewer decoder stages while still achieving considerable
performance. Experimental results on the COCO dataset demonstrate the
effectiveness of our approach.
Related papers
- Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks [9.207022068713867]
We present a comprehensive empirical evaluation of the adversarial robustness of self-supervised vision encoders across multiple downstream tasks.
Our attacks operate in the encoder embedding space and at the downstream task output level.
Since the purpose of a foundation model is to cater to multiple applications at once, our findings reveal the need to enhance encoder robustness more broadly.
arXiv Detail & Related papers (2024-07-17T14:12:34Z) - Tailored Design of Audio-Visual Speech Recognition Models using Branchformers [0.0]
We propose a novel framework for the design of parameter-efficient Audio-Visual Speech Recognition systems.
To be more precise, the proposed framework consists of two steps: first, estimating audio- and video-only systems, and then designing a tailored audio-visual unified encoder.
Results reflect how our tailored AVSR system is able to reach state-of-the-art recognition rates.
arXiv Detail & Related papers (2024-07-09T07:15:56Z) - Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction [57.16121098944589]
RDA is a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders.
It is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives.
For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs.
arXiv Detail & Related papers (2023-12-01T15:03:29Z) - Complexity Matters: Rethinking the Latent Space for Generative Modeling [65.64763873078114]
In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion.
In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity.
arXiv Detail & Related papers (2023-07-17T07:12:29Z) - Retriever and Ranker Framework with Probabilistic Hard Negative Sampling
for Code Search [11.39443308694887]
We introduce a cross-encoder architecture for code search that jointly encodes the semantic matching of query and code.
We also introduce a Retriever-Ranker framework that cascades the dual-encoder and cross-encoder to promote the efficiency of evaluation and online serving.
arXiv Detail & Related papers (2023-05-08T07:04:28Z) - Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix
Factorization [60.91600465922932]
We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder.
Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
arXiv Detail & Related papers (2022-10-23T00:32:04Z) - Revisiting Code Search in a Two-Stage Paradigm [67.02322603435628]
TOSS is a two-stage fusion code search framework.
It first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates.
It then uses fine-grained cross-encoders for finer ranking.
arXiv Detail & Related papers (2022-08-24T02:34:27Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object
Detection [65.74032877197844]
One-stage detectors are more efficient owing to straightforward architectures, but the two-stage detectors still take the lead in accuracy.
We propose MimicDet, a novel framework to train a one-stage detector by directly mimicking the two-stage features.
Mimic methods have a shared backbone for one-stage and two-stage detectors, then it branches into two heads which are well designed to have compatible features for mimicking.
arXiv Detail & Related papers (2020-09-24T07:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.