Related papers: DQ-DETR: DETR with Dynamic Query for Tiny Object Detection

DQ-DETR: DETR with Dynamic Query for Tiny Object Detection

URL: http://arxiv.org/abs/2404.03507v2
Date: Thu, 11 Apr 2024 18:54:24 GMT
Title: DQ-DETR: DETR with Dynamic Query for Tiny Object Detection
Authors: Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai, Wen-Huang Cheng,
Abstract summary: We present a model named DQ-DETR, which consists of three components: categorical counting module, counting-guided feature enhancement, and dynamic query selection. Our model outperforms previous CNN-based and DETR-like methods, achieving state-of-the-art mAP 30.2% on the AI-TOD-V2 dataset.
Score: 29.559819542066236
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite previous DETR-like methods having performed successfully in generic object detection, tiny object detection is still a challenging task for them since the positional information of object queries is not customized for detecting tiny objects, whose scale is extraordinarily smaller than general objects. Also, DETR-like methods using a fixed number of queries make them unsuitable for aerial datasets, which only contain tiny objects, and the numbers of instances are imbalanced between different images. Thus, we present a simple yet effective model, named DQ-DETR, which consists of three different components: categorical counting module, counting-guided feature enhancement, and dynamic query selection to solve the above-mentioned problems. DQ-DETR uses the prediction and density maps from the categorical counting module to dynamically adjust the number of object queries and improve the positional information of queries. Our model DQ-DETR outperforms previous CNN-based and DETR-like methods, achieving state-of-the-art mAP 30.2% on the AI-TOD-V2 dataset, which mostly consists of tiny objects.

Related papers

Querying Autonomous Vehicle Point Clouds: Enhanced by 3D Object Counting with CounterNet [21.55830632188697]
We formalize point cloud querying by defining three core query types: RETRIEVAL, COUNT, and AGGREGATION.<n>CounterNet is a heatmap-based network designed for accurate object counting in large-scale point cloud data.
arXiv Detail & Related papers (2025-07-25T12:29:21Z)
Redundant Queries in DETR-Based 3D Detection Methods: Unnecessary and Prunable [14.172280530766358]
We propose an approach called bdGradually bdPruning bdQueries (GPQ) GPQ prunes queries incrementally based on their classification scores. It achieves up to a 67.86% reduction in FLOPs and a 76.38% decrease in inference time.
arXiv Detail & Related papers (2024-12-03T00:26:04Z)
Dynamic Object Queries for Transformer-based Incremental Object Detection [45.41291377837515]
Incremental object detection aims to sequentially learn new classes, while maintaining the capability to locate and identify old ones. Prior methodologies mainly tackle the forgetting issue through knowledge distillation and exemplar replay. We propose DyQ-DETR, which incrementally expands the model representation ability to achieve stability-plasticity tradeoffs.
arXiv Detail & Related papers (2024-07-31T15:29:34Z)
SEED: A Simple and Effective 3D DETR in Point Clouds [72.74016394325675]
We argue that the main challenges are challenging due to the high sparsity and uneven distribution of point clouds. We propose a simple and effective 3D DETR method (SEED) for detecting 3D objects from point clouds.
arXiv Detail & Related papers (2024-07-15T14:21:07Z)
Visible and Clear: Finding Tiny Objects in Difference Map [50.54061010335082]
We introduce a self-reconstruction mechanism in the detection model, and discover the strong correlation between it and the tiny objects. Specifically, we impose a reconstruction head in-between the neck of a detector, constructing a difference map of the reconstructed image and the input, which shows high sensitivity to tiny objects. We further develop a Difference Map Guided Feature Enhancement (DGFE) module to make the tiny feature representation more clear.
arXiv Detail & Related papers (2024-05-18T12:22:26Z)
Small Object Detection by DETR via Information Augmentation and Adaptive Feature Fusion [4.9860018132769985]
The RT-DETR model performs well in real-time object detection, but performs poorly in small object detection accuracy. We propose an adaptive feature fusion algorithm that assigns learnable parameters to each feature map from different levels. This enhances the model's ability to capture object features at different scales, thereby improving the accuracy of detecting small objects.
arXiv Detail & Related papers (2024-01-16T00:01:23Z)
Contrastive Learning for Multi-Object Tracking with Transformers [79.61791059432558]
We show how DETR can be turned into a MOT model by employing an instance-level contrastive loss. Our training scheme learns object appearances while preserving detection capabilities and with little overhead. Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset.
arXiv Detail & Related papers (2023-11-14T10:07:52Z)
Siamese-DETR for Generic Multi-Object Tracking [16.853363984562602]
Traditional Multi-Object Tracking (MOT) is limited to tracking objects belonging to the pre-defined closed-set categories. Siamese-DETR is proposed to track objects beyond pre-defined categories with the given text prompt and template image. Siamese-DETR surpasses existing MOT methods on GMOT-40 dataset by a large margin.
arXiv Detail & Related papers (2023-10-27T03:32:05Z)
Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines. It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module. Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z)
V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection [73.37781484123536]
We introduce a highly performant 3D object detector for point clouds using the DETR framework. To address the limitation, we introduce a novel 3D Relative Position (3DV-RPE) method. We show exceptional results on the challenging ScanNetV2 benchmark.
arXiv Detail & Related papers (2023-08-08T17:14:14Z)
D2Q-DETR: Decoupling and Dynamic Queries for Oriented Object Detection with Transformers [14.488821968433834]
We propose an end-to-end framework for oriented object detection. Our framework is based on DETR, with the box regression head replaced with a points prediction head. Experiments on the largest and challenging DOTA-v1.0 and DOTA-v1.5 datasets show that D2Q-DETR outperforms existing NMS-based and NMS-free oriented object detection methods.
arXiv Detail & Related papers (2023-03-01T14:36:19Z)
Few-shot Object Counting and Detection [25.61294147822642]
We tackle a new task of few-shot object counting and detection. Given a few exemplar bounding boxes of a target object class, we seek to count and detect all objects of the target class. This task shares the same supervision as the few-shot object counting but additionally outputs the object bounding boxes along with the total object count. We introduce a novel two-stage training strategy and a novel uncertainty-aware few-shot object detector: Counting-DETR.
arXiv Detail & Related papers (2022-07-22T10:09:18Z)
End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)
Dynamic Refinement Network for Oriented and Densely Packed Object Detection [75.29088991850958]
We present a dynamic refinement network that consists of two novel components, i.e., a feature selection module (FSM) and a dynamic refinement head (DRH) Our FSM enables neurons to adjust receptive fields in accordance with the shapes and orientations of target objects, whereas the DRH empowers our model to refine the prediction dynamically in an object-aware manner. We perform quantitative evaluations on several publicly available benchmarks including DOTA, HRSC2016, SKU110K, and our own SKU110K-R dataset.
arXiv Detail & Related papers (2020-05-20T11:35:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.