Small Object Detection in Complex Backgrounds with Multi-Scale Attention and Global Relation Modeling
- URL: http://arxiv.org/abs/2603.03788v1
- Date: Wed, 04 Mar 2026 06:57:46 GMT
- Title: Small Object Detection in Complex Backgrounds with Multi-Scale Attention and Global Relation Modeling
- Authors: Wenguang Tao, Xiaotian Wang, Tian Yan, Yi Wang, Jie Yan,
- Abstract summary: Small object detection under complex backgrounds is a challenging task due to severe feature degradation, weak semantic representation, and inaccurate localization.<n>Existing detection frameworks are mainly designed for general objects.<n>We propose a multi-level feature enhancement and global relation modeling framework tailored for small object detection.
- Score: 8.24377869183113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Small object detection under complex backgrounds remains a challenging task due to severe feature degradation, weak semantic representation, and inaccurate localization caused by downsampling operations and background interference. Existing detection frameworks are mainly designed for general objects and often fail to explicitly address the unique characteristics of small objects, such as limited structural cues and strong sensitivity to localization errors. In this paper, we propose a multi-level feature enhancement and global relation modeling framework tailored for small object detection. Specifically, a Residual Haar Wavelet Downsampling module is introduced to preserve fine-grained structural details by jointly exploiting spatial-domain convolutional features and frequency-domain representations. To enhance global semantic awareness and suppress background noise, a Global Relation Modeling module is employed to capture long-range dependencies at high-level feature stages. Furthermore, a Cross-Scale Hybrid Attention module is designed to establish sparse and aligned interactions across multi-scale features, enabling effective fusion of high-resolution details and high-level semantic information with reduced computational overhead. Finally, a Center-Assisted Loss is incorporated to stabilize training and improve localization accuracy for small objects. Extensive experiments conducted on the large-scale RGBT-Tiny benchmark demonstrate that the proposed method consistently outperforms existing state-of-the-art detectors under both IoU-based and scale-adaptive evaluation metrics. These results validate the effectiveness and robustness of the proposed framework for small object detection in complex environments.
Related papers
- IoUCert: Robustness Verification for Anchor-based Object Detectors [58.35703549470485]
We introduce sc sf IoUCert, a novel formal verification framework for anchor-based object detection.<n>We show that our method enables the verification of realistic, anchor-based models including SSD, YOLOv2, and YOLOv3 variants against various input perturbations.
arXiv Detail & Related papers (2026-03-03T14:36:46Z) - Noise-Robust Tiny Object Localization with Flows [63.60972031108944]
We propose a noise-robust localization framework leveraging normalizing flows for flexible error modeling and uncertainty-guided optimization.<n>Our method captures complex, non-Gaussian prediction distributions through flow-based error modeling, enabling robust learning under noisy supervision.<n>An uncertainty-aware gradient modulation mechanism further suppresses learning from high-uncertainty, noise-prone samples, mitigating overfitting while stabilizing training.
arXiv Detail & Related papers (2026-01-02T09:16:55Z) - Source-Free Object Detection with Detection Transformer [59.33653163035064]
Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data.<n>Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR)<n>In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs.
arXiv Detail & Related papers (2025-10-13T07:35:04Z) - TinyDef-DETR: A Transformer-Based Framework for Defect Detection in Transmission Lines from UAV Imagery [12.48571944931548]
TinyDef-DETR is a framework designed to achieve accurate and efficient detection of transmission line defects from UAV-acquired images.<n>The model integrates four major components: an edge-enhanced ResNet backbone to strengthen boundary-sensitive representations, a stride-free space-to-depth module to enable detail-preserving downsampling, and a Focaler-Wise-SIoU regression loss to improve the localization of small and difficult objects.
arXiv Detail & Related papers (2025-09-07T12:36:33Z) - MGDFIS: Multi-scale Global-detail Feature Integration Strategy for Small Object Detection [12.838872442435527]
Small object detection in UAV imagery is crucial for applications such as search-and-rescue, traffic monitoring, and environmental surveillance.<n>Existing multi-scale fusion methods help, but add computational burden and blur fine details.<n>We propose a unified fusion framework that tightly couples global context with local detail to boost detection performance.
arXiv Detail & Related papers (2025-06-15T02:54:25Z) - Learning to Borrow Features for Improved Detection of Small Objects in Single-Shot Detectors [0.0]
We propose a novel framework that enables small object representations to "borrow" discriminative features from larger, semantically richer instances within the same class.<n>Our approach significantly boosts small object detection accuracy over baseline methods, offering a promising direction for robust object detection in complex visual environments.
arXiv Detail & Related papers (2025-04-30T01:18:33Z) - DiffMOD: Progressive Diffusion Point Denoising for Moving Object Detection in Remote Sensing [40.607660968380394]
Moving object detection (MOD) in remote sensing is significantly challenged by low resolution, extremely small object sizes, and complex noise interference.<n>Current deep learning-based MOD methods rely on probability density estimation, which restricts flexible information interaction between objects.<n>We propose a point-based MOD in remote sensing that iteratively recovers moving object centers from sparse noisy points.
arXiv Detail & Related papers (2025-04-14T14:44:52Z) - MSCA-Net:Multi-Scale Context Aggregation Network for Infrared Small Target Detection [0.1759252234439348]
This paper proposes a network architecture named MSCA-Net, which integrates three key components.<n>MSEDA employs a multi-scale feature fusion attention mechanism to adaptively aggregate information across different scales.<n>PCBAM captures the correlation between global and local features through a correlation matrix-based strategy.<n> CAB enhances the representation of critical features by assigning greater weights to them, integrating both low-level and high-level information.
arXiv Detail & Related papers (2025-03-21T14:42:31Z) - Efficient Feature Fusion for UAV Object Detection [9.632727117779178]
Small objects, in particular, occupy small portions of images, making their accurate detection difficult.<n>Existing multi-scale feature fusion methods address these challenges by aggregating features across different resolutions.<n>We propose a novel feature fusion framework specifically designed for UAV object detection tasks.
arXiv Detail & Related papers (2025-01-29T20:39:16Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Progressive Self-Guided Loss for Salient Object Detection [102.35488902433896]
We present a progressive self-guided loss function to facilitate deep learning-based salient object detection in images.
Our framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively.
arXiv Detail & Related papers (2021-01-07T07:33:38Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.