Small Object Detection for Birds with Swin Transformer
- URL: http://arxiv.org/abs/2511.22310v1
- Date: Thu, 27 Nov 2025 10:42:37 GMT
- Title: Small Object Detection for Birds with Swin Transformer
- Authors: Da Huo, Marc A. Kastner, Tingwei Liu, Yasutomo Kawanishi, Takatsugu Hirayama, Takahiro Komamizu, Ichiro Ide,
- Abstract summary: We propose a specialized method for detecting a specific category of small objects; birds.<n>We employ Swin Transformer to upsample the image features and shift the window size for adapting to small objects.<n>Experiments show that the proposed Swin Transformer-based neck combined with CenterNet can lead to good performance by changing the window sizes.
- Score: 10.40030551931284
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Object detection is the task of detecting objects in an image. In this task, the detection of small objects is particularly difficult. Other than the small size, it is also accompanied by difficulties due to blur, occlusion, and so on. Current small object detection methods are tailored to small and dense situations, such as pedestrians in a crowd or far objects in remote sensing scenarios. However, when the target object is small and sparse, there is a lack of objects available for training, making it more difficult to learn effective features. In this paper, we propose a specialized method for detecting a specific category of small objects; birds. Particularly, we improve the features learned by the neck; the sub-network between the backbone and the prediction head, to learn more effective features with a hierarchical design. We employ Swin Transformer to upsample the image features. Moreover, we change the shifted window size for adapting to small objects. Experiments show that the proposed Swin Transformer-based neck combined with CenterNet can lead to good performance by changing the window sizes. We further find that smaller window sizes (default 2) benefit mAPs for small object detection.
Related papers
- Visible and Clear: Finding Tiny Objects in Difference Map [50.54061010335082]
We introduce a self-reconstruction mechanism in the detection model, and discover the strong correlation between it and the tiny objects.
Specifically, we impose a reconstruction head in-between the neck of a detector, constructing a difference map of the reconstructed image and the input, which shows high sensitivity to tiny objects.
We further develop a Difference Map Guided Feature Enhancement (DGFE) module to make the tiny feature representation more clear.
arXiv Detail & Related papers (2024-05-18T12:22:26Z) - YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images [33.80392696735718]
YOLC (You Only Look Clusters) is an efficient and effective framework that builds on an anchor-free object detector, CenterNet.
To overcome the challenges posed by large-scale images and non-uniform object distribution, we introduce a Local Scale Module (LSM) that adaptively searches cluster regions for zooming in for accurate detection.
We perform extensive experiments on two aerial image datasets, including Visdrone 2019 and UAVDT, to demonstrate the effectiveness and superiority of our proposed approach.
arXiv Detail & Related papers (2024-04-09T10:03:44Z) - Improving the Detection of Small Oriented Objects in Aerial Images [0.0]
We propose a method to accurately detect small oriented objects in aerial images by enhancing the classification and regression tasks of the oriented object detection model.
We designed the Attention-Points Network consisting of two losses: Guided-Attention Loss (GALoss) and Box-Points Loss (BPLoss)
Experimental results show the effectiveness of our Attention-Points Network on a standard oriented aerial dataset with small object instances.
arXiv Detail & Related papers (2024-01-12T11:00:07Z) - On the Importance of Large Objects in CNN Based Object Detection
Algorithms [0.0]
We highlight the importance of large objects in learning features that are critical for all sizes.
We show that giving more weight to large objects leads to improved detection scores across all object sizes.
arXiv Detail & Related papers (2023-11-20T12:32:32Z) - Rethinking the backbone architecture for tiny object detection [0.0]
Existing tiny object detection methods use standard deep neural networks as their backbone architecture.
We argue that such backbones are inappropriate for detecting tiny objects as they are designed for the classification of larger objects, and do not have the spatial resolution to identify small targets.
We design 'bottom-heavy' versions of backbones that allocate more resources to processing higher-resolution features without introducing any additional computational burden overall.
arXiv Detail & Related papers (2023-03-20T16:50:29Z) - Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors.
First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning.
Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z) - Embracing Single Stride 3D Object Detector with Sparse Transformer [63.179720817019096]
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.
Many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds.
We propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network.
arXiv Detail & Related papers (2021-12-13T02:12:02Z) - Decoupled Adaptation for Cross-Domain Object Detection [69.5852335091519]
Cross-domain object detection is more challenging than object classification.
D-adapt achieves a state-of-the-art results on four cross-domain object detection tasks.
arXiv Detail & Related papers (2021-10-06T08:43:59Z) - You Better Look Twice: a new perspective for designing accurate
detectors with reduced computations [56.34005280792013]
BLT-net is a new low-computation two-stage object detection architecture.
It reduces computations by separating objects from background using a very lite first-stage.
Resulting image proposals are then processed in the second-stage by a highly accurate model.
arXiv Detail & Related papers (2021-07-21T12:39:51Z) - A Simple and Effective Use of Object-Centric Images for Long-Tailed
Object Detection [56.82077636126353]
We take advantage of object-centric images to improve object detection in scene-centric images.
We present a simple yet surprisingly effective framework to do so.
Our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively.
arXiv Detail & Related papers (2021-02-17T17:27:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.