Related papers: Small Object Detection by DETR via Information Augmentation and Adaptive Feature Fusion

Small Object Detection by DETR via Information Augmentation and Adaptive Feature Fusion

URL: http://arxiv.org/abs/2401.08017v1
Date: Tue, 16 Jan 2024 00:01:23 GMT
Title: Small Object Detection by DETR via Information Augmentation and Adaptive Feature Fusion
Authors: Ji Huang, Hui Wang
Abstract summary: The RT-DETR model performs well in real-time object detection, but performs poorly in small object detection accuracy. We propose an adaptive feature fusion algorithm that assigns learnable parameters to each feature map from different levels. This enhances the model's ability to capture object features at different scales, thereby improving the accuracy of detecting small objects.
Score: 4.9860018132769985
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The main challenge for small object detection algorithms is to ensure accuracy while pursuing real-time performance. The RT-DETR model performs well in real-time object detection, but performs poorly in small object detection accuracy. In order to compensate for the shortcomings of the RT-DETR model in small object detection, two key improvements are proposed in this study. Firstly, The RT-DETR utilises a Transformer that receives input solely from the final layer of Backbone features. This means that the Transformer's input only receives semantic information from the highest level of abstraction in the Deep Network, and ignores detailed information such as edges, texture or color gradients that are critical to the location of small objects at lower levels of abstraction. Including only deep features can introduce additional background noise. This can have a negative impact on the accuracy of small object detection. To address this issue, we propose the fine-grained path augmentation method. This method helps to locate small objects more accurately by providing detailed information to the deep network. So, the input to the transformer contains both semantic and detailed information. Secondly, In RT-DETR, the decoder takes feature maps of different levels as input after concatenating them with equal weight. However, this operation is not effective in dealing with the complex relationship of multi-scale information captured by feature maps of different sizes. Therefore, we propose an adaptive feature fusion algorithm that assigns learnable parameters to each feature map from different levels. This allows the model to adaptively fuse feature maps from different levels and effectively integrate feature information from different scales. This enhances the model's ability to capture object features at different scales, thereby improving the accuracy of detecting small objects.

Related papers

Visible and Clear: Finding Tiny Objects in Difference Map [50.54061010335082]
We introduce a self-reconstruction mechanism in the detection model, and discover the strong correlation between it and the tiny objects. Specifically, we impose a reconstruction head in-between the neck of a detector, constructing a difference map of the reconstructed image and the input, which shows high sensitivity to tiny objects. We further develop a Difference Map Guided Feature Enhancement (DGFE) module to make the tiny feature representation more clear.
arXiv Detail & Related papers (2024-05-18T12:22:26Z)
Better Sampling, towards Better End-to-end Small Object Detection [7.7473020808686694]
Small object detection remains unsatisfactory due to limited characteristics and high density and mutual overlap. We propose methods enhancing sampling within an end-to-end framework. Our model demonstrates a significant enhancement, achieving a 2.9% increase in average precision (AP) over the state-of-the-art (SOTA) on the VisDrone dataset.
arXiv Detail & Related papers (2024-05-17T04:37:44Z)
Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head. The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement. This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z)
Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images [11.648151981111436]
This paper takes an important step in bridging the performance gap between DETR and R-CNN for graphical object detection. We modify object queries in different ways, using points, anchor boxes and adding positive and negative noise to the anchors to boost performance. We evaluate our approach on the four graphical datasets: PubTables, TableBank, NTable and PubLaynet.
arXiv Detail & Related papers (2023-06-23T14:46:03Z)
Tucker Bilinear Attention Network for Multi-scale Remote Sensing Object Detection [10.060030309684953]
Large-scale variation of remote-sensing targets is one of main challenges in VHR remote-sensing object detection. This paper proposes two novel modules: Guided Attention and Tucker Bilinear Attention. Based on two modules, we build a new multi-scale remote sensing object detection framework.
arXiv Detail & Related papers (2023-03-09T15:20:03Z)
Knowledge Distillation for Oriented Object Detection on Aerial Images [1.827510863075184]
We present a model compression method for rotated object detection on aerial images by knowledge distillation, namely KD-RNet. The experimental result on a large-scale aerial object detection dataset (DOTA) demonstrates that the proposed KD-RNet model can achieve improved mean-average precision (mAP) with reduced number of parameters, at the same time, KD-RNet boost the performance on providing high quality detections with higher overlap with groundtruth annotations.
arXiv Detail & Related papers (2022-06-20T14:24:16Z)
Embracing Single Stride 3D Object Detector with Sparse Transformer [63.179720817019096]
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds. We propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network.
arXiv Detail & Related papers (2021-12-13T02:12:02Z)
Multi-patch Feature Pyramid Network for Weakly Supervised Object Detection in Optical Remote Sensing Images [39.25541709228373]
We propose a new architecture for object detection with a multiple patch feature pyramid network (MPFP-Net) MPFP-Net is different from the current models that during training only pursue the most discriminative patches. We introduce an effective method to regularize the residual values and make the fusion transition layers strictly norm-preserving.
arXiv Detail & Related papers (2021-08-18T09:25:39Z)
You Better Look Twice: a new perspective for designing accurate detectors with reduced computations [56.34005280792013]
BLT-net is a new low-computation two-stage object detection architecture. It reduces computations by separating objects from background using a very lite first-stage. Resulting image proposals are then processed in the second-stage by a highly accurate model.
arXiv Detail & Related papers (2021-07-21T12:39:51Z)
DA-DETR: Domain Adaptive Detection Transformer with Information Fusion [53.25930448542148]
DA-DETR is a domain adaptive object detection transformer that introduces information fusion for effective transfer from a labeled source domain to an unlabeled target domain. We introduce a novel CNN-Transformer Blender (CTBlender) that fuses the CNN features and Transformer features ingeniously for effective feature alignment and knowledge transfer across domains. CTBlender employs the Transformer features to modulate the CNN features across multiple scales where the high-level semantic information and the low-level spatial information are fused for accurate object identification and localization.
arXiv Detail & Related papers (2021-03-31T13:55:56Z)
End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.