DQ-DETR: DETR with Dynamic Query for Tiny Object Detection
- URL: http://arxiv.org/abs/2404.03507v2
- Date: Thu, 11 Apr 2024 18:54:24 GMT
- Title: DQ-DETR: DETR with Dynamic Query for Tiny Object Detection
- Authors: Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai, Wen-Huang Cheng,
- Abstract summary: We present a model named DQ-DETR, which consists of three components: categorical counting module, counting-guided feature enhancement, and dynamic query selection.
Our model outperforms previous CNN-based and DETR-like methods, achieving state-of-the-art mAP 30.2% on the AI-TOD-V2 dataset.
- Score: 29.559819542066236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite previous DETR-like methods having performed successfully in generic object detection, tiny object detection is still a challenging task for them since the positional information of object queries is not customized for detecting tiny objects, whose scale is extraordinarily smaller than general objects. Also, DETR-like methods using a fixed number of queries make them unsuitable for aerial datasets, which only contain tiny objects, and the numbers of instances are imbalanced between different images. Thus, we present a simple yet effective model, named DQ-DETR, which consists of three different components: categorical counting module, counting-guided feature enhancement, and dynamic query selection to solve the above-mentioned problems. DQ-DETR uses the prediction and density maps from the categorical counting module to dynamically adjust the number of object queries and improve the positional information of queries. Our model DQ-DETR outperforms previous CNN-based and DETR-like methods, achieving state-of-the-art mAP 30.2% on the AI-TOD-V2 dataset, which mostly consists of tiny objects.
Related papers
- Visible and Clear: Finding Tiny Objects in Difference Map [50.54061010335082]
We introduce a self-reconstruction mechanism in the detection model, and discover the strong correlation between it and the tiny objects.
Specifically, we impose a reconstruction head in-between the neck of a detector, constructing a difference map of the reconstructed image and the input, which shows high sensitivity to tiny objects.
We further develop a Difference Map Guided Feature Enhancement (DGFE) module to make the tiny feature representation more clear.
arXiv Detail & Related papers (2024-05-18T12:22:26Z) - PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion fuses information of RGB images and LiDAR point clouds at the point of interest (abbreviated as PoI)
Our approach prevents information loss caused by view transformation and eliminates the computation-intensive global attention.
Remarkably, our PoIFusion achieves 74.9% NDS and 73.4% mAP, setting a state-of-the-art record on the multi-modal 3D object detection benchmark.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - Contrastive Learning for Multi-Object Tracking with Transformers [79.61791059432558]
We show how DETR can be turned into a MOT model by employing an instance-level contrastive loss.
Our training scheme learns object appearances while preserving detection capabilities and with little overhead.
Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset.
arXiv Detail & Related papers (2023-11-14T10:07:52Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - D2Q-DETR: Decoupling and Dynamic Queries for Oriented Object Detection
with Transformers [14.488821968433834]
We propose an end-to-end framework for oriented object detection.
Our framework is based on DETR, with the box regression head replaced with a points prediction head.
Experiments on the largest and challenging DOTA-v1.0 and DOTA-v1.5 datasets show that D2Q-DETR outperforms existing NMS-based and NMS-free oriented object detection methods.
arXiv Detail & Related papers (2023-03-01T14:36:19Z) - Few-shot Object Counting and Detection [25.61294147822642]
We tackle a new task of few-shot object counting and detection. Given a few exemplar bounding boxes of a target object class, we seek to count and detect all objects of the target class.
This task shares the same supervision as the few-shot object counting but additionally outputs the object bounding boxes along with the total object count.
We introduce a novel two-stage training strategy and a novel uncertainty-aware few-shot object detector: Counting-DETR.
arXiv Detail & Related papers (2022-07-22T10:09:18Z) - Dynamic Proposals for Efficient Object Detection [48.66093789652899]
We propose a simple yet effective method which is adaptive to different computational resources by generating dynamic proposals for object detection.
Our method achieves significant speed-up across a wide range of detection models including two-stage and query-based models.
arXiv Detail & Related papers (2022-07-12T01:32:50Z) - Object DGCNN: 3D Object Detection using Dynamic Graphs [32.090268859180334]
3D object detection often involves complicated training and testing pipelines.
Inspired by recent non-maximum suppression-free 2D object detection models, we propose a 3D object detection architecture on point clouds.
arXiv Detail & Related papers (2021-10-13T17:59:38Z) - Few-shot Object Detection on Remote Sensing Images [11.40135025181393]
We introduce a few-shot learning-based method for object detection on remote sensing images.
We build our few-shot object detection model upon YOLOv3 architecture and develop a multi-scale object detection framework.
arXiv Detail & Related papers (2020-06-14T07:18:10Z) - Dynamic Refinement Network for Oriented and Densely Packed Object
Detection [75.29088991850958]
We present a dynamic refinement network that consists of two novel components, i.e., a feature selection module (FSM) and a dynamic refinement head (DRH)
Our FSM enables neurons to adjust receptive fields in accordance with the shapes and orientations of target objects, whereas the DRH empowers our model to refine the prediction dynamically in an object-aware manner.
We perform quantitative evaluations on several publicly available benchmarks including DOTA, HRSC2016, SKU110K, and our own SKU110K-R dataset.
arXiv Detail & Related papers (2020-05-20T11:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.