Edge-guided Representation Learning for Underwater Object Detection
- URL: http://arxiv.org/abs/2306.00440v1
- Date: Thu, 1 Jun 2023 08:29:44 GMT
- Title: Edge-guided Representation Learning for Underwater Object Detection
- Authors: Linhui Dai, Hong Liu, Pinhao Song, Hao Tang, Runwei Ding, Shengquan Li
- Abstract summary: Underwater object detection is crucial for marine economic development, environmental protection, and the planet's sustainable development.
Main challenges of this task arise from low-contrast, small objects, and mimicry of aquatic organisms.
We propose an Edge-guided Representation Learning Network, termed ERL-Net, that aims to achieve discriminative representation learning and aggregation.
- Score: 15.832646455660278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Underwater object detection (UOD) is crucial for marine economic development,
environmental protection, and the planet's sustainable development. The main
challenges of this task arise from low-contrast, small objects, and mimicry of
aquatic organisms. The key to addressing these challenges is to focus the model
on obtaining more discriminative information. We observe that the edges of
underwater objects are highly unique and can be distinguished from low-contrast
or mimicry environments based on their edges. Motivated by this observation, we
propose an Edge-guided Representation Learning Network, termed ERL-Net, that
aims to achieve discriminative representation learning and aggregation under
the guidance of edge cues. Firstly, we introduce an edge-guided attention
module to model the explicit boundary information, which generates more
discriminative features. Secondly, a feature aggregation module is proposed to
aggregate the multi-scale discriminative features by regrouping them into three
levels, effectively aggregating global and local information for locating and
recognizing underwater objects. Finally, we propose a wide and asymmetric
receptive field block to enable features to have a wider receptive field,
allowing the model to focus on more small object information. Comprehensive
experiments on three challenging underwater datasets show that our method
achieves superior performance on the UOD task.
Related papers
- SPMamba-YOLO: An Underwater Object Detection Network Based on Multi-Scale Feature Enhancement and Global Context Modeling [12.390389688362506]
We propose a novel underwater object detection network that integrates multi-scale feature enhancement with global context modeling.<n>Experiments on the URPC2022 dataset demonstrate that the network outperforms the YOLOv8n baseline by more than 4.9% in mAP@0.5.
arXiv Detail & Related papers (2026-02-26T06:45:11Z) - Expose Camouflage in the Water: Underwater Camouflaged Instance Segmentation and Dataset [76.92197418745822]
camouflaged instance segmentation (CIS) faces greater challenges in accurately segmenting objects that blend closely with their surroundings.<n>Traditional camouflaged instance segmentation methods, trained on terrestrial-dominated datasets with limited underwater samples, may exhibit inadequate performance in underwater scenes.<n>We introduce the first underwater camouflaged instance segmentation dataset, UCIS4K, which comprises 3,953 images of camouflaged marine organisms with instance-level annotations.
arXiv Detail & Related papers (2025-10-20T14:34:51Z) - Neptune-X: Active X-to-Maritime Generation for Universal Maritime Object Detection [54.1960918379255]
Neptune-X is a data-centric generative-selection framework for maritime object detection.<n>X-to-Maritime is a multi-modality-conditioned generative model that synthesizes diverse and realistic maritime scenes.<n>Our approach sets a new benchmark in maritime scene synthesis, significantly improving detection accuracy.
arXiv Detail & Related papers (2025-09-25T04:59:02Z) - A Structured Review of Underwater Object Detection Challenges and Solutions: From Traditional to Large Vision Language Models [10.013311332835823]
Underwater object detection (UOD) is vital to diverse marine applications, including oceanographic research, underwater robotics, and marine conservation.<n>Current UOD methods are insufficient to fully address challenges like image degradation and small object detection in dynamic underwater environments.<n>Large vision-language models (LVLMs) hold significant promise for UOD, but their real-time application remains under-explored.
arXiv Detail & Related papers (2025-09-10T11:01:29Z) - SemNav: A Model-Based Planner for Zero-Shot Object Goal Navigation Using Vision-Foundation Models [10.671262416557704]
Vision Foundation Models (VFMs) offer powerful capabilities for visual understanding and reasoning.<n>We present a zero-shot object goal navigation framework that integrates the perceptual strength of VFMs with a model-based planner.<n>We evaluate our approach on the HM3D dataset using the Habitat simulator and demonstrate that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-06-04T03:04:54Z) - Oh-A-DINO: Understanding and Enhancing Attribute-Level Information in Self-Supervised Object-Centric Representations [9.949149600332836]
Self-supervised vision models and slot-based representations excel at identifying edge-derived geometry but fail to preserve non-geometric surface-level cues.<n>We show that learning an auxiliary latent space over segmented patches, where VAE regularisation enforces compact, disentangled object-centric representations, recovers these missing attributes.
arXiv Detail & Related papers (2025-03-12T21:57:41Z) - Sonar-based Deep Learning in Underwater Robotics: Overview, Robustness and Challenges [0.46873264197900916]
The predominant use of sonar in underwater environments, characterized by limited training data and inherent noise, poses challenges to model robustness.
This paper studies sonar-based perception task models, such as classification, object detection, segmentation, and SLAM.
It systematizes sonar-based state-of-the-art datasets, simulators, and robustness methods such as neural network verification, out-of-distribution, and adversarial attacks.
arXiv Detail & Related papers (2024-12-16T15:03:08Z) - Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts [25.187507472845944]
This study proposes a recognition framework called M3 to enhance the model's ability to capture robust patterns.
In this framework, an auxiliary task that focuses on target properties, such as estimating target size, is designed.
M3 incorporates multi-expert and multi-gate mechanisms, allowing for the allocation of distinct parameter spaces to various underwater signals.
arXiv Detail & Related papers (2024-11-05T03:52:36Z) - Mining and Transferring Feature-Geometry Coherence for Unsupervised Point Cloud Registration [23.909530805458605]
Motivated by this observation, we propose a novel unsupervised registration method termed INTEGER to incorporate high-level contextual information for reliable pseudo-label mining.
Specifically, we propose the Feature-Geometry Coherence Mining module to dynamically adapt the teacher for each mini-batch of data during training and discover reliable pseudo-labels.
Lastly, we introduce a Mixed-Density Student to learn density-invariant features, addressing challenges related to density variation and low overlap in the outdoor scenario.
arXiv Detail & Related papers (2024-11-04T07:57:44Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - MuLA-GAN: Multi-Level Attention GAN for Enhanced Underwater Visibility [1.9272863690919875]
We introduce MuLA-GAN, a novel approach that leverages the synergistic power of Geneversarative Adrial Networks (GANs) and Multi-Level Attention mechanisms for comprehensive underwater image enhancement.
Our model excels in capturing and preserving intricate details in underwater imagery, essential for various applications.
This work not only addresses a significant research gap in underwater image enhancement but also underscores the pivotal role of Multi-Level Attention in enhancing GANs.
arXiv Detail & Related papers (2023-12-25T07:33:47Z) - ADOD: Adaptive Domain-Aware Object Detection with Residual Attention for
Underwater Environments [1.2624532490634643]
This research presents ADOD, a novel approach to address domain generalization in underwater object detection.
Our method enhances the model's ability to generalize across diverse and unseen domains, ensuring robustness in various underwater environments.
arXiv Detail & Related papers (2023-12-11T19:20:56Z) - Bidirectional Knowledge Reconfiguration for Lightweight Point Cloud
Analysis [74.00441177577295]
Point cloud analysis faces computational system overhead, limiting its application on mobile or edge devices.
This paper explores feature distillation for lightweight point cloud models.
We propose bidirectional knowledge reconfiguration to distill informative contextual knowledge from the teacher to the student.
arXiv Detail & Related papers (2023-10-08T11:32:50Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Salient Object Detection via Integrity Learning [104.13483971954233]
Integrity is the concept of highlighting all parts that belong to a certain salient object.
To facilitate integrity learning for salient object detection, we design a novel Integrity Cognition Network (ICON)
ICON explores three important components to learn strong integrity features.
arXiv Detail & Related papers (2021-01-19T14:53:12Z) - Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a
First-person Simulated 3D Environment [73.9469267445146]
First-person object-interaction tasks in high-fidelity, 3D, simulated environments such as the AI2Thor pose significant sample-efficiency challenges for reinforcement learning agents.
We show that one can learn object-interaction tasks from scratch without supervision by learning an attentive object-model as an auxiliary task.
arXiv Detail & Related papers (2020-10-28T19:27:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.