Redundant Queries in DETR-Based 3D Detection Methods: Unnecessary and Prunable
- URL: http://arxiv.org/abs/2412.02054v1
- Date: Tue, 03 Dec 2024 00:26:04 GMT
- Title: Redundant Queries in DETR-Based 3D Detection Methods: Unnecessary and Prunable
- Authors: Lizhen Xu, Shanmin Pang, Wenzhao Qiu, Zehao Wu, Xiuxiu Bai, Kuizhi Mei, Jianru Xue,
- Abstract summary: We propose an approach called bdGradually bdPruning bdQueries (GPQ)
GPQ prunes queries incrementally based on their classification scores.
It achieves up to a 67.86% reduction in FLOPs and a 76.38% decrease in inference time.
- Score: 14.172280530766358
- License:
- Abstract: Query-based models are extensively used in 3D object detection tasks, with a wide range of pre-trained checkpoints readily available online. However, despite their popularity, these models often require an excessive number of object queries, far surpassing the actual number of objects to detect. The redundant queries result in unnecessary computational and memory costs. In this paper, we find that not all queries contribute equally -- a significant portion of queries have a much smaller impact compared to others. Based on this observation, we propose an embarrassingly simple approach called \bd{G}radually \bd{P}runing \bd{Q}ueries (GPQ), which prunes queries incrementally based on their classification scores. It is straightforward to implement in any query-based method, as it can be seamlessly integrated as a fine-tuning step using an existing checkpoint after training. With GPQ, users can easily generate multiple models with fewer queries, starting from a checkpoint with an excessive number of queries. Experiments on various advanced 3D detectors show that GPQ effectively reduces redundant queries while maintaining performance. Using our method, model inference on desktop GPUs can be accelerated by up to 1.31x. Moreover, after deployment on edge devices, it achieves up to a 67.86\% reduction in FLOPs and a 76.38\% decrease in inference time. The code will be available at \url{https://github.com/iseri27/Gpq}.
Related papers
- Is Complex Query Answering Really Complex? [28.8459899849641]
We show that the current benchmarks for CQA might not be as complex as we think.
We propose a set of more challenging benchmarks composed of queries that require models to reason over multiple hops.
arXiv Detail & Related papers (2024-10-16T13:19:03Z) - DQ-DETR: DETR with Dynamic Query for Tiny Object Detection [29.559819542066236]
We present a simple yet effective model, named DQ-DETR, which consists of three different components.
DQ-DETR uses the prediction and density maps from the categorical counting module to dynamically adjust the number of object queries.
Our model outperforms previous CNN-based and DETR-like methods, achieving state-of-the-art mAP 30.2% on the AI-TOD-V2 dataset.
arXiv Detail & Related papers (2024-04-04T15:10:24Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES.
Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query.
By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z) - Dense Distinct Query for End-to-End Object Detection [39.32011383066249]
One-to-one assignment in object detection has successfully obviated the need for non-maximum suppression.
This paper shows that the solution should be Dense Distinct Queries (DDQ)
DDQ blends the advantages of traditional and recent end-to-end detectors and significantly improves the performance of various detectors.
arXiv Detail & Related papers (2023-03-22T17:42:22Z) - DBQ-SSD: Dynamic Ball Query for Efficient 3D Object Detection [113.5418064456229]
We propose a Dynamic Ball Query (DBQ) network to adaptively select a subset of input points according to the input features.
It can be embedded into some state-of-the-art 3D detectors and trained in an end-to-end manner, which significantly reduces the computational cost.
arXiv Detail & Related papers (2022-07-22T07:08:42Z) - What Are Expected Queries in End-to-End Object Detection? [28.393693394478724]
This paper shows that the expected queries should be COCO Distinct Queries (DDQ)
DDQ is stronger, more robust, and converges faster than previous methods.
It obtains 44.5 AP on the MSarity detection dataset with only 12 epochs.
arXiv Detail & Related papers (2022-06-02T18:15:44Z) - Knowledge Base Question Answering by Case-based Reasoning over Subgraphs [81.22050011503933]
We show that our model answers queries requiring complex reasoning patterns more effectively than existing KG completion algorithms.
The proposed model outperforms or performs competitively with state-of-the-art models on several KBQA benchmarks.
arXiv Detail & Related papers (2022-02-22T01:34:35Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.