Related papers: Dome-DETR: DETR with Density-Oriented Feature-Query Manipulation for Efficient Tiny Object Detection

Dome-DETR: DETR with Density-Oriented Feature-Query Manipulation for Efficient Tiny Object Detection

URL: http://arxiv.org/abs/2505.05741v1
Date: Fri, 09 May 2025 02:44:06 GMT
Title: Dome-DETR: DETR with Density-Oriented Feature-Query Manipulation for Efficient Tiny Object Detection
Authors: Zhangchi Hu, Peixi Wu, Jie Chen, Huyue Zhu, Yijun Wang, Yansong Peng, Hebei Li, Xiaoyan Sun,
Abstract summary: Dome-DETR is a novel framework with Density-Oriented Feature-Query Manipulation for Efficient Tiny Object Detection.<n>It achieves state-of-the-art performance (+3.3 AP on AI-TOD-V2 and +2.5 AP on VisDrone) while maintaining low computational complexity and a compact model size.
Score: 7.16574066661446
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tiny object detection plays a vital role in drone surveillance, remote sensing, and autonomous systems, enabling the identification of small targets across vast landscapes. However, existing methods suffer from inefficient feature leverage and high computational costs due to redundant feature processing and rigid query allocation. To address these challenges, we propose Dome-DETR, a novel framework with Density-Oriented Feature-Query Manipulation for Efficient Tiny Object Detection. To reduce feature redundancies, we introduce a lightweight Density-Focal Extractor (DeFE) to produce clustered compact foreground masks. Leveraging these masks, we incorporate Masked Window Attention Sparsification (MWAS) to focus computational resources on the most informative regions via sparse attention. Besides, we propose Progressive Adaptive Query Initialization (PAQI), which adaptively modulates query density across spatial areas for better query allocation. Extensive experiments demonstrate that Dome-DETR achieves state-of-the-art performance (+3.3 AP on AI-TOD-V2 and +2.5 AP on VisDrone) while maintaining low computational complexity and a compact model size. Code will be released upon acceptance.

Related papers

High-Frequency Semantics and Geometric Priors for End-to-End Detection Transformers in Challenging UAV Imagery [4.833513511627847]
Unmanned Aerial Vehicle-based Object Detection (UAV-OD) faces substantial challenges, including small target sizes, high-density distributions, and cluttered backgrounds in UAV imagery.<n>We propose HEGS-DETR, a comprehensively enhanced, real-time Detection Transformer framework tailored for UAVs.<n> Experiments on the VisDrone dataset demonstrate that HEGS-DETR achieves a 5.1% AP50 and 3.8% AP increase over the baseline, while maintaining real-time speed and reducing parameter count by 4M.
arXiv Detail & Related papers (2025-07-01T14:56:56Z)
An Efficient Aerial Image Detection with Variable Receptive Fields [0.0]
We propose a transformer-based detector incorporating three key components.<n>VRF-DETR achieves 51.4% mAPtextsubscript50 and 31.8% mAPtextsubscript50:95 with only 13.5M parameters.
arXiv Detail & Related papers (2025-04-21T15:16:13Z)
SO-DETR: Leveraging Dual-Domain Features and Knowledge Distillation for Small Object Detection [15.03203094818889]
This paper proposes an efficient model, Small Object Detection Transformer (SO-DETR)<n>The model comprises three key components: a dual-domain hybrid encoder, an enhanced query selection mechanism, and a knowledge distillation strategy.<n> Experimental results on the VisDrone 2019-DET and UAVVaste datasets demonstrate that SO-DETR outperforms existing methods with similar computational demands.
arXiv Detail & Related papers (2025-04-11T13:47:37Z)
Efficient Oriented Object Detection with Enhanced Small Object Recognition in Aerial Images [2.9138705529771123]
We present a novel enhancement to the YOLOv8 model, tailored for oriented object detection tasks.<n>Our model features a wavelet transform-based C2f module for capturing associative features and an Adaptive Scale Feature Pyramid (ASFP) module that leverages P2 layer details.<n>Our approach provides a more efficient architectural design than DecoupleNet, which has 23.3M parameters, all while maintaining detection accuracy.
arXiv Detail & Related papers (2024-12-17T05:45:48Z)
PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN) PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z)
ESOD: Efficient Small Object Detection on High-Resolution Images [36.80623357577051]
Small objects are usually sparsely distributed and locally clustered.<n>Massive feature extraction computations are wasted on the non-target background area of images.<n>We propose to reuse the detector's backbone to conduct feature-level object-seeking and patch-slicing.
arXiv Detail & Related papers (2024-07-23T12:21:23Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection. We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution. It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z)
SALISA: Saliency-based Input Sampling for Efficient Video Object Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection. We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z)
FOVEA: Foveated Image Magnification for Autonomous Navigation [53.69803081925454]
We propose an attentional approach that elastically magnifies certain regions while maintaining a small input canvas. Our proposed method boosts the detection AP over standard Faster R-CNN, with and without finetuning. On the autonomous driving datasets Argoverse-HD and BDD100K, we show our proposed method boosts the detection AP over standard Faster R-CNN, with and without finetuning.
arXiv Detail & Related papers (2021-08-27T03:07:55Z)
Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations. In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z)
MRDet: A Multi-Head Network for Accurate Oriented Object Detection in Aerial Images [51.227489316673484]
We propose an arbitrary-oriented region proposal network (AO-RPN) to generate oriented proposals transformed from horizontal anchors. To obtain accurate bounding boxes, we decouple the detection task into multiple subtasks and propose a multi-head network. Each head is specially designed to learn the features optimal for the corresponding task, which allows our network to detect objects accurately.
arXiv Detail & Related papers (2020-12-24T06:36:48Z)
End-to-End Object Detection with Fully Convolutional Network [71.56728221604158]
We introduce a Prediction-aware One-To-One (POTO) label assignment for classification to enable end-to-end detection. A simple 3D Max Filtering (3DMF) is proposed to utilize the multi-scale features and improve the discriminability of convolutions in the local region. Our end-to-end framework achieves competitive performance against many state-of-the-art detectors with NMS on COCO and CrowdHuman datasets.
arXiv Detail & Related papers (2020-12-07T09:14:55Z)
PENet: Object Detection using Points Estimation in Aerial Images [9.33900415971554]
A novel network structure, Points Estimated Network (PENet), is proposed in this work to answer these challenges. PENet uses a Mask Resampling Module (MRM) to augment the imbalanced datasets, a coarse anchor-free detector (CPEN) to effectively predict the center points of the small object clusters, and a fine anchor-free detector FPEN to locate the precise positions of the small objects. Our experiments on aerial datasets visDrone and UAVDT showed that PENet achieved higher precision results than existing state-of-the-art approaches.
arXiv Detail & Related papers (2020-01-22T19:43:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.