Small Object Detection by DETR via Information Augmentation and Adaptive
Feature Fusion
- URL: http://arxiv.org/abs/2401.08017v1
- Date: Tue, 16 Jan 2024 00:01:23 GMT
- Title: Small Object Detection by DETR via Information Augmentation and Adaptive
Feature Fusion
- Authors: Ji Huang, Hui Wang
- Abstract summary: The RT-DETR model performs well in real-time object detection, but performs poorly in small object detection accuracy.
We propose an adaptive feature fusion algorithm that assigns learnable parameters to each feature map from different levels.
This enhances the model's ability to capture object features at different scales, thereby improving the accuracy of detecting small objects.
- Score: 4.9860018132769985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The main challenge for small object detection algorithms is to ensure
accuracy while pursuing real-time performance. The RT-DETR model performs well
in real-time object detection, but performs poorly in small object detection
accuracy. In order to compensate for the shortcomings of the RT-DETR model in
small object detection, two key improvements are proposed in this study.
Firstly, The RT-DETR utilises a Transformer that receives input solely from the
final layer of Backbone features. This means that the Transformer's input only
receives semantic information from the highest level of abstraction in the Deep
Network, and ignores detailed information such as edges, texture or color
gradients that are critical to the location of small objects at lower levels of
abstraction. Including only deep features can introduce additional background
noise. This can have a negative impact on the accuracy of small object
detection. To address this issue, we propose the fine-grained path augmentation
method. This method helps to locate small objects more accurately by providing
detailed information to the deep network. So, the input to the transformer
contains both semantic and detailed information. Secondly, In RT-DETR, the
decoder takes feature maps of different levels as input after concatenating
them with equal weight. However, this operation is not effective in dealing
with the complex relationship of multi-scale information captured by feature
maps of different sizes. Therefore, we propose an adaptive feature fusion
algorithm that assigns learnable parameters to each feature map from different
levels. This allows the model to adaptively fuse feature maps from different
levels and effectively integrate feature information from different scales.
This enhances the model's ability to capture object features at different
scales, thereby improving the accuracy of detecting small objects.
Related papers
- Visible and Clear: Finding Tiny Objects in Difference Map [50.54061010335082]
We introduce a self-reconstruction mechanism in the detection model, and discover the strong correlation between it and the tiny objects.
Specifically, we impose a reconstruction head in-between the neck of a detector, constructing a difference map of the reconstructed image and the input, which shows high sensitivity to tiny objects.
We further develop a Difference Map Guided Feature Enhancement (DGFE) module to make the tiny feature representation more clear.
arXiv Detail & Related papers (2024-05-18T12:22:26Z) - Better Sampling, towards Better End-to-end Small Object Detection [7.7473020808686694]
Small object detection remains unsatisfactory due to limited characteristics and high density and mutual overlap.
We propose methods enhancing sampling within an end-to-end framework.
Our model demonstrates a significant enhancement, achieving a 2.9% increase in average precision (AP) over the state-of-the-art (SOTA) on the VisDrone dataset.
arXiv Detail & Related papers (2024-05-17T04:37:44Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - Bridging the Performance Gap between DETR and R-CNN for Graphical Object
Detection in Document Images [11.648151981111436]
This paper takes an important step in bridging the performance gap between DETR and R-CNN for graphical object detection.
We modify object queries in different ways, using points, anchor boxes and adding positive and negative noise to the anchors to boost performance.
We evaluate our approach on the four graphical datasets: PubTables, TableBank, NTable and PubLaynet.
arXiv Detail & Related papers (2023-06-23T14:46:03Z) - Tucker Bilinear Attention Network for Multi-scale Remote Sensing Object
Detection [10.060030309684953]
Large-scale variation of remote-sensing targets is one of main challenges in VHR remote-sensing object detection.
This paper proposes two novel modules: Guided Attention and Tucker Bilinear Attention.
Based on two modules, we build a new multi-scale remote sensing object detection framework.
arXiv Detail & Related papers (2023-03-09T15:20:03Z) - Knowledge Distillation for Oriented Object Detection on Aerial Images [1.827510863075184]
We present a model compression method for rotated object detection on aerial images by knowledge distillation, namely KD-RNet.
The experimental result on a large-scale aerial object detection dataset (DOTA) demonstrates that the proposed KD-RNet model can achieve improved mean-average precision (mAP) with reduced number of parameters, at the same time, KD-RNet boost the performance on providing high quality detections with higher overlap with groundtruth annotations.
arXiv Detail & Related papers (2022-06-20T14:24:16Z) - Embracing Single Stride 3D Object Detector with Sparse Transformer [63.179720817019096]
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.
Many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds.
We propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network.
arXiv Detail & Related papers (2021-12-13T02:12:02Z) - Multi-patch Feature Pyramid Network for Weakly Supervised Object
Detection in Optical Remote Sensing Images [39.25541709228373]
We propose a new architecture for object detection with a multiple patch feature pyramid network (MPFP-Net)
MPFP-Net is different from the current models that during training only pursue the most discriminative patches.
We introduce an effective method to regularize the residual values and make the fusion transition layers strictly norm-preserving.
arXiv Detail & Related papers (2021-08-18T09:25:39Z) - You Better Look Twice: a new perspective for designing accurate
detectors with reduced computations [56.34005280792013]
BLT-net is a new low-computation two-stage object detection architecture.
It reduces computations by separating objects from background using a very lite first-stage.
Resulting image proposals are then processed in the second-stage by a highly accurate model.
arXiv Detail & Related papers (2021-07-21T12:39:51Z) - DA-DETR: Domain Adaptive Detection Transformer with Information Fusion [53.25930448542148]
DA-DETR is a domain adaptive object detection transformer that introduces information fusion for effective transfer from a labeled source domain to an unlabeled target domain.
We introduce a novel CNN-Transformer Blender (CTBlender) that fuses the CNN features and Transformer features ingeniously for effective feature alignment and knowledge transfer across domains.
CTBlender employs the Transformer features to modulate the CNN features across multiple scales where the high-level semantic information and the low-level spatial information are fused for accurate object identification and localization.
arXiv Detail & Related papers (2021-03-31T13:55:56Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.