Related papers: AdaMixer: A Fast-Converging Query-Based Object Detector

AdaMixer: A Fast-Converging Query-Based Object Detector

URL: http://arxiv.org/abs/2203.16507v2
Date: Thu, 31 Mar 2022 10:22:26 GMT
Title: AdaMixer: A Fast-Converging Query-Based Object Detector
Authors: Ziteng Gao, Limin Wang, Bing Han, Sheng Guo
Abstract summary: We propose a fast-converging query-based object detector named AdaMixer. AdaMixer has architectural simplicity without requiring explicit pyramid networks. Our work sheds light on a simple, accurate, and fast converging architecture for query-based object detectors.
Score: 32.159871347459166
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Traditional object detectors employ the dense paradigm of scanning over locations and scales in an image. The recent query-based object detectors break this convention by decoding image features with a set of learnable queries. However, this paradigm still suffers from slow convergence, limited performance, and design complexity of extra networks between backbone and decoder. In this paper, we find that the key to these issues is the adaptability of decoders for casting queries to varying objects. Accordingly, we propose a fast-converging query-based detector, named AdaMixer, by improving the adaptability of query-based decoding processes in two aspects. First, each query adaptively samples features over space and scales based on estimated offsets, which allows AdaMixer to efficiently attend to the coherent regions of objects. Then, we dynamically decode these sampled features with an adaptive MLP-Mixer under the guidance of each query. Thanks to these two critical designs, AdaMixer enjoys architectural simplicity without requiring dense attentional encoders or explicit pyramid networks. On the challenging MS COCO benchmark, AdaMixer with ResNet-50 as the backbone, with 12 training epochs, reaches up to 45.0 AP on the validation set along with 27.9 APs in detecting small objects. With the longer training scheme, AdaMixer with ResNeXt-101-DCN and Swin-S reaches 49.5 and 51.3 AP. Our work sheds light on a simple, accurate, and fast converging architecture for query-based object detectors. The code is made available at https://github.com/MCG-NJU/AdaMixer

Related papers

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM [81.75988648572347]
We present DetToolChain, a novel prompting paradigm to unleash the zero-shot object detection ability of multimodal large language models (MLLMs) Our approach consists of a detection prompting toolkit inspired by high-precision detection priors and a new Chain-of-Thought to implement these prompts. We show that GPT-4V with our DetToolChain improves state-of-the-art object detectors by +21.5% AP50 on MS Novel class set for open-vocabulary detection.
arXiv Detail & Related papers (2024-03-19T06:54:33Z)
Deep Equilibrium Object Detection [24.69829309391189]
We present a new query-based object detector (DEQDet) by designing a deep equilibrium decoder. Our experiments demonstrate DEQDet converges faster, consumes less memory, and achieves better results than the baseline counterpart.
arXiv Detail & Related papers (2023-08-18T13:56:03Z)
ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation [50.01244854344167]
We bridge the performance gap between sparse and dense detectors by proposing Adaptive Sparse Anchor Generator (ASAG) ASAG predicts dynamic anchors on patches rather than grids in a sparse way so that it alleviates the feature conflict problem. Our method outperforms dense-d ones and achieves a better speed-accuracy trade-off.
arXiv Detail & Related papers (2023-08-18T02:06:49Z)
StageInteractor: Query-based Object Detector with Cross-stage Interaction [21.84964476813102]
We propose a new query-based object detector with cross-stage interaction, coined as StageInteractor. Our model improves the baseline by 2.2 AP, and achieves 44.8 AP with ResNet-50 as backbone. With longer training time and 300 queries, StageInteractor achieves 51.1 AP and 52.2 AP with ResNeXt-101-DCN and Swin-S, respectively.
arXiv Detail & Related papers (2023-04-11T04:50:13Z)
Dense Distinct Query for End-to-End Object Detection [39.32011383066249]
One-to-one assignment in object detection has successfully obviated the need for non-maximum suppression. This paper shows that the solution should be Dense Distinct Queries (DDQ) DDQ blends the advantages of traditional and recent end-to-end detectors and significantly improves the performance of various detectors.
arXiv Detail & Related papers (2023-03-22T17:42:22Z)
Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion [95.7732308775325]
The proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection. DETR suffers from slow training convergence, which hinders its applicability to various detection tasks. We design Semantic-Aligned-Matching DETR++ to accelerate DETR's convergence and improve detection performance.
arXiv Detail & Related papers (2022-07-28T15:34:29Z)
EAutoDet: Efficient Architecture Search for Object Detection [110.99532343155073]
EAutoDet framework can discover practical backbone and FPN architectures for object detection in 1.4 GPU-days. We propose a kernel reusing technique by sharing the weights of candidate operations on one edge and consolidating them into one convolution. In particular, the discovered architectures surpass state-of-the-art object detection NAS methods and achieve 40.1 mAP with 120 FPS and 49.2 mAP with 41.3 FPS on COCO test-dev set.
arXiv Detail & Related papers (2022-03-21T05:56:12Z)
Anchor DETR: Query Design for Transformer-Based Detector [24.925317590675203]
We propose a novel query design for the transformer-based detectors. Object queries are based on anchor points, which are widely used in CNN-based detectors. Our design can predict multiple objects at one position to solve the difficulty: "one region, multiple objects"
arXiv Detail & Related papers (2021-09-15T06:31:55Z)
Disentangle Your Dense Object Detector [82.22771433419727]
Deep learning-based dense object detectors have achieved great success in the past few years and have been applied to numerous multimedia applications such as video understanding. However, the current training pipeline for dense detectors is compromised to lots of conjunctions that may not hold. We propose Disentangled Dense Object Detector (DDOD), in which simple and effective disentanglement mechanisms are designed and integrated into the current state-of-the-art detectors.
arXiv Detail & Related papers (2021-07-07T00:52:16Z)
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution [27.67084901207291]
We explore the mechanism of looking and thinking twice in the backbone design for object detection. At the macro level, we propose Recursive Feature Pyramid, which incorporates extra feedback connections from Feature Pyramid Networks. At the micro level, we propose Switchable Atrous Convolution, which convolves the features with different atrous rates and gathers the results.
arXiv Detail & Related papers (2020-06-03T15:28:16Z)
End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.