SCAResNet: A ResNet Variant Optimized for Tiny Object Detection in Transmission and Distribution Towers
- URL: http://arxiv.org/abs/2404.04179v1
- Date: Fri, 5 Apr 2024 15:48:36 GMT
- Title: SCAResNet: A ResNet Variant Optimized for Tiny Object Detection in Transmission and Distribution Towers
- Authors: Weile Li, Muqing Shi, Zhonghua Hong,
- Abstract summary: Traditional deep learning-based object detection networks often resize images during the data pre-processing stage to achieve a uniform size and scale in the feature map.
We introduce Positional-Multi-head Criss-Cross Imagery to capture contextual information and learn from multiple representation subspaces.
This approach allows images of different sizes and scales to generate feature maps with uniform dimensions and can be employed in feature map propagation.
- Score: 0.42028553027796633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional deep learning-based object detection networks often resize images during the data preprocessing stage to achieve a uniform size and scale in the feature map. Resizing is done to facilitate model propagation and fully connected classification. However, resizing inevitably leads to object deformation and loss of valuable information in the images. This drawback becomes particularly pronounced for tiny objects like distribution towers with linear shapes and few pixels. To address this issue, we propose abandoning the resizing operation. Instead, we introduce Positional-Encoding Multi-head Criss-Cross Attention. This allows the model to capture contextual information and learn from multiple representation subspaces, effectively enriching the semantics of distribution towers. Additionally, we enhance Spatial Pyramid Pooling by reshaping three pooled feature maps into a new unified one while also reducing the computational burden. This approach allows images of different sizes and scales to generate feature maps with uniform dimensions and can be employed in feature map propagation. Our SCAResNet incorporates these aforementioned improvements into the backbone network ResNet. We evaluated our SCAResNet using the Electric Transmission and Distribution Infrastructure Imagery dataset from Duke University. Without any additional tricks, we employed various object detection models with Gaussian Receptive Field based Label Assignment as the baseline. When incorporating the SCAResNet into the baseline model, we achieved a 2.1% improvement in mAPs. This demonstrates the advantages of our SCAResNet in detecting transmission and distribution towers and its value in tiny object detection. The source code is available at https://github.com/LisavilaLee/SCAResNet_mmdet.
Related papers
- Diffusion-based Data Augmentation for Object Counting Problems [62.63346162144445]
We develop a pipeline that utilizes a diffusion model to generate extensive training data.
We are the first to generate images conditioned on a location dot map with a diffusion model.
Our proposed counting loss for the diffusion model effectively minimizes the discrepancies between the location dot map and the crowd images generated.
arXiv Detail & Related papers (2024-01-25T07:28:22Z) - SODAWideNet -- Salient Object Detection with an Attention augmented Wide
Encoder Decoder network without ImageNet pre-training [3.66237529322911]
We explore developing a neural network from scratch directly trained on Salient Object Detection without ImageNet pre-training.
We propose SODAWideNet, an encoder-decoder-style network for Salient Object Detection.
Two variants, SODAWideNet-S (3.03M) and SODAWideNet (9.03M), achieve competitive performance against state-of-the-art models on five datasets.
arXiv Detail & Related papers (2023-11-08T16:53:44Z) - ScaleNet: An Unsupervised Representation Learning Method for Limited
Information [0.0]
A simple and efficient unsupervised representation learning method named ScaleNet is proposed.
Specific image features, such as Harris corner information, play a critical role in the efficiency of the rotation-prediction task.
The transferred parameters from a ScaleNet model with limited data improve the ImageNet Classification task by about 6% compared to the RotNet model.
arXiv Detail & Related papers (2023-10-03T19:13:43Z) - SARAS-Net: Scale and Relation Aware Siamese Network for Change Detection [6.12477318852572]
Change detection (CD) aims to find the difference between two images at different times and outputs a change map to represent whether the region has changed or not.
Many State-of-The-Art (SoTA) methods design a deep learning model that has a powerful discriminative ability.
We propose our network, the Scale and Relation-Aware Siamese Network (SARAS-Net) to deal with this issue.
arXiv Detail & Related papers (2022-12-02T16:30:33Z) - a novel attention-based network for fast salient object detection [14.246237737452105]
In the current salient object detection network, the most popular method is using U-shape structure.
We propose a new deep convolution network architecture with three contributions.
Results demonstrate that the proposed method can compress the model to 1/3 of the original size nearly without losing the accuracy.
arXiv Detail & Related papers (2021-12-20T12:30:20Z) - Multi-patch Feature Pyramid Network for Weakly Supervised Object
Detection in Optical Remote Sensing Images [39.25541709228373]
We propose a new architecture for object detection with a multiple patch feature pyramid network (MPFP-Net)
MPFP-Net is different from the current models that during training only pursue the most discriminative patches.
We introduce an effective method to regularize the residual values and make the fusion transition layers strictly norm-preserving.
arXiv Detail & Related papers (2021-08-18T09:25:39Z) - You Better Look Twice: a new perspective for designing accurate
detectors with reduced computations [56.34005280792013]
BLT-net is a new low-computation two-stage object detection architecture.
It reduces computations by separating objects from background using a very lite first-stage.
Resulting image proposals are then processed in the second-stage by a highly accurate model.
arXiv Detail & Related papers (2021-07-21T12:39:51Z) - Local Grid Rendering Networks for 3D Object Detection in Point Clouds [98.02655863113154]
CNNs are powerful but it would be computationally costly to directly apply convolutions on point data after voxelizing the entire point clouds to a dense regular 3D grid.
We propose a novel and principled Local Grid Rendering (LGR) operation to render the small neighborhood of a subset of input points into a low-resolution 3D grid independently.
We validate LGR-Net for 3D object detection on the challenging ScanNet and SUN RGB-D datasets.
arXiv Detail & Related papers (2020-07-04T13:57:43Z) - Improved Residual Networks for Image and Video Recognition [98.10703825716142]
Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture.
We show consistent improvements in accuracy and learning convergence over the baseline.
Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues.
arXiv Detail & Related papers (2020-04-10T11:09:50Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z) - NETNet: Neighbor Erasing and Transferring Network for Better Single Shot
Object Detection [170.30694322460045]
We propose a new Neighbor Erasing and Transferring (NET) mechanism to reconfigure the pyramid features and explore scale-aware features.
A single-shot network called NETNet is constructed for scale-aware object detection.
arXiv Detail & Related papers (2020-01-18T15:21:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.