Underwater object detection in sonar imagery with detection transformer and Zero-shot neural architecture search
- URL: http://arxiv.org/abs/2505.06694v1
- Date: Sat, 10 May 2025 16:41:09 GMT
- Title: Underwater object detection in sonar imagery with detection transformer and Zero-shot neural architecture search
- Authors: XiaoTong Gu, Shengyu Tang, Yiming Cao, Changdong Yu,
- Abstract summary: Underwater object detection using sonar imagery has become a critical and rapidly evolving research domain within marine technology.<n>We specifically propose a Detection Transformer (DETR) architecture optimized with a Neural Architecture Search (NAS) approach.<n>This architecture achieves state-of-the-art performance on two Representative datasets.
- Score: 0.8624680612413766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Underwater object detection using sonar imagery has become a critical and rapidly evolving research domain within marine technology. However, sonar images are characterized by lower resolution and sparser features compared to optical images, which seriously degrades the performance of object detection.To address these challenges, we specifically propose a Detection Transformer (DETR) architecture optimized with a Neural Architecture Search (NAS) approach called NAS-DETR for object detection in sonar images. First, an improved Zero-shot Neural Architecture Search (NAS) method based on the maximum entropy principle is proposed to identify a real-time, high-representational-capacity CNN-Transformer backbone for sonar image detection. This method enables the efficient discovery of high-performance network architectures with low computational and time overhead. Subsequently, the backbone is combined with a Feature Pyramid Network (FPN) and a deformable attention-based Transformer decoder to construct a complete network architecture. This architecture integrates various advanced components and training schemes to enhance overall performance. Extensive experiments demonstrate that this architecture achieves state-of-the-art performance on two Representative datasets, while maintaining minimal overhead in real-time efficiency and computational complexity. Furthermore, correlation analysis between the key parameters and differential entropy-based fitness function is performed to enhance the interpretability of the proposed framework. To the best of our knowledge, this is the first work in the field of sonar object detection to integrate the DETR architecture with a NAS search mechanism.
Related papers
- MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism [67.56918651825056]
We propose a new decoder architecture with the parallel Multi-time Inquiries (MI) mechanism.<n>Our MI based model, MI-DETR, outperforms all existing DETR-like models on COCO benchmark.<n>A series of diagnostic and visualization experiments demonstrate the effectiveness, rationality, and interpretability of MI.
arXiv Detail & Related papers (2025-03-03T12:19:06Z) - EM-DARTS: Hierarchical Differentiable Architecture Search for Eye Movement Recognition [20.209756662832365]
Differentiable Neural Architecture Search (DARTS) automates the manual process of architecture design with high search efficiency.<n>We propose EM-DARTS, a hierarchical differentiable architecture search algorithm to automatically design the DL architecture for eye movement recognition.<n>We show that EM-DARTS is capable of producing an optimal architecture that leads to state-of-the-art recognition performance.
arXiv Detail & Related papers (2024-09-22T13:11:08Z) - Efficient and Accurate Hyperspectral Image Demosaicing with Neural Network Architectures [3.386560551295746]
This study investigates the effectiveness of neural network architectures in hyperspectral image demosaicing.
We introduce a range of network models and modifications, and compare them with classical methods and existing reference network approaches.
Results indicate that our networks outperform or match reference models in both datasets demonstrating exceptional performance.
arXiv Detail & Related papers (2023-12-21T08:02:49Z) - Multi-Objective Evolutionary for Object Detection Mobile Architectures
Search [21.14296703753317]
We propose a mobile object detection backbone network architecture search algorithm based on non-dominated sorting for NAS scenarios.
The proposed approach can search the backbone networks with different depths, widths, or expansion sizes via a technique of weight mapping.
Under similar computational complexity, the accuracy of the backbone network architecture we search for is 2.0% mAP higher than MobileDet.
arXiv Detail & Related papers (2022-11-05T00:28:49Z) - Neural Architecture Adaptation for Object Detection by Searching Channel
Dimensions and Mapping Pre-trained Parameters [17.090405682103167]
Most object detection frameworks use backbone architectures originally designed for image classification, conventionally with pre-trained parameters on ImageNet.
Recent neural architecture search (NAS) research has demonstrated that automatically designing a backbone specifically for object detection helps improve the overall accuracy.
We introduce a neural architecture adaptation method that can optimize the given backbone for detection purposes, while still allowing the use of pre-trained parameters.
arXiv Detail & Related papers (2022-06-17T02:01:56Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - NAS-FCOS: Efficient Search for Object Detection Architectures [113.47766862146389]
We propose an efficient method to obtain better object detectors by searching for the feature pyramid network (FPN) and the prediction head of a simple anchor-free object detector.
With carefully designed search space, search algorithms, and strategies for evaluating network quality, we are able to find top-performing detection architectures within 4 days using 8 V100 GPUs.
arXiv Detail & Related papers (2021-10-24T12:20:04Z) - NeighCNN: A CNN based SAR Speckle Reduction using Feature preserving
Loss Function [1.7188280334580193]
NeighCNN is a deep learning-based speckle reduction algorithm that handles multiplicative noise.
Various synthetic, as well as real SAR images, are used for testing the NeighCNN architecture.
arXiv Detail & Related papers (2021-08-26T04:20:07Z) - GLiT: Neural Architecture Search for Global and Local Image Transformer [114.8051035856023]
We introduce the first Neural Architecture Search (NAS) method to find a better transformer architecture for image recognition.
Our method can find more discriminative and efficient transformer variants than the ResNet family and the baseline ViT for image classification.
arXiv Detail & Related papers (2021-07-07T00:48:09Z) - Neural Architecture Optimization with Graph VAE [21.126140965779534]
We propose an efficient NAS approach to optimize network architectures in a continuous space.
The framework jointly learns four components: the encoder, the performance predictor, the complexity predictor and the decoder.
arXiv Detail & Related papers (2020-06-18T07:05:48Z) - Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications.
These networks consist of stages, which are sets of layers that operate on representations in the same resolution.
It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network.
However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.