AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis
- URL: http://arxiv.org/abs/2502.01785v1
- Date: Mon, 03 Feb 2025 19:56:16 GMT
- Title: AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis
- Authors: Basit Alawode, Iyyakutti Iyappan Ganapathi, Sajid Javed, Naoufel Werghi, Mohammed Bennamoun, Arif Mahmood,
- Abstract summary: We introduce AquaticCLIP, a novel contrastive language-image pre-training model tailored for aquatic scene understanding.<n> AquaticCLIP presents a new unsupervised learning framework that aligns images and texts in aquatic environments.<n>Our model sets a new benchmark for vision-language applications in underwater environments.
- Score: 40.27548815196493
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The preservation of aquatic biodiversity is critical in mitigating the effects of climate change. Aquatic scene understanding plays a pivotal role in aiding marine scientists in their decision-making processes. In this paper, we introduce AquaticCLIP, a novel contrastive language-image pre-training model tailored for aquatic scene understanding. AquaticCLIP presents a new unsupervised learning framework that aligns images and texts in aquatic environments, enabling tasks such as segmentation, classification, detection, and object counting. By leveraging our large-scale underwater image-text paired dataset without the need for ground-truth annotations, our model enriches existing vision-language models in the aquatic domain. For this purpose, we construct a 2 million underwater image-text paired dataset using heterogeneous resources, including YouTube, Netflix, NatGeo, etc. To fine-tune AquaticCLIP, we propose a prompt-guided vision encoder that progressively aggregates patch features via learnable prompts, while a vision-guided mechanism enhances the language encoder by incorporating visual context. The model is optimized through a contrastive pretraining loss to align visual and textual modalities. AquaticCLIP achieves notable performance improvements in zero-shot settings across multiple underwater computer vision tasks, outperforming existing methods in both robustness and interpretability. Our model sets a new benchmark for vision-language applications in underwater environments. The code and dataset for AquaticCLIP are publicly available on GitHub at xxx.
Related papers
- Learning Underwater Active Perception in Simulation [51.205673783866146]
Turbidity can jeopardise the whole mission as it may prevent correct visual documentation of the inspected structures.
Previous works have introduced methods to adapt to turbidity and backscattering.
We propose a simple yet efficient approach to enable high-quality image acquisition of assets in a broad range of water conditions.
arXiv Detail & Related papers (2025-04-23T06:48:38Z) - Closer to Ground Truth: Realistic Shape and Appearance Labeled Data Generation for Unsupervised Underwater Image Segmentation [8.511846002129522]
We introduce a novel two stage unsupervised segmentation approach that requires no human annotations and combines artificially created and real images.
Our method generates challenging synthetic training data, by placing virtual fish in real-world underwater habitats.
We show its effectiveness on the specific case of salmon segmentation in underwater videos, for which we introduce DeepSalmon, the largest dataset of its kind in the literature (30 GB)
arXiv Detail & Related papers (2025-03-20T11:34:45Z) - Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments [57.59857784298534]
We propose an integrated pipeline that combines Visual Place Recognition (VPR), feature matching, and image segmentation on video-derived images.
This method enables robust identification of revisited areas, estimation of rigid transformations, and downstream analysis of ecosystem changes.
arXiv Detail & Related papers (2025-03-06T05:13:19Z) - FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation [65.01601309903971]
We introduce FAFA, a Frequency-Aware Flow-Aided self-supervised framework for 6D pose estimation of unmanned underwater vehicles (UUVs)
Our framework relies solely on the 3D model and RGB images, alleviating the need for any real pose annotations or other-modality data like depths.
We evaluate the effectiveness of FAFA on common underwater object pose benchmarks and showcase significant performance improvements compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-09-25T03:54:01Z) - Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset [60.14089302022989]
Underwater vision tasks often suffer from low segmentation accuracy due to the complex underwater circumstances.
We construct the first large-scale underwater salient instance segmentation dataset (USIS10K)
We propose an Underwater Salient Instance architecture based on Segment Anything Model (USIS-SAM) specifically for the underwater domain.
arXiv Detail & Related papers (2024-06-10T06:17:33Z) - Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method [0.0]
We have utilized the cycle consistent learning technique of the state-of-the-art Cycle GAN model with modification in the loss function.
We trained the Cycle GAN model with the modified loss functions on the benchmarked Enhancing Underwater Visual Perception dataset.
The upgraded images provide better results from conventional models and further for under water navigation, pose estimation, saliency prediction, object detection and tracking.
arXiv Detail & Related papers (2024-04-11T11:12:06Z) - PUGAN: Physical Model-Guided Underwater Image Enhancement Using GAN with
Dual-Discriminators [120.06891448820447]
How to obtain clear and visually pleasant images has become a common concern of people.
The task of underwater image enhancement (UIE) has also emerged as the times require.
In this paper, we propose a physical model-guided GAN model for UIE, referred to as PUGAN.
Our PUGAN outperforms state-of-the-art methods in both qualitative and quantitative metrics.
arXiv Detail & Related papers (2023-06-15T07:41:12Z) - DeepAqua: Self-Supervised Semantic Segmentation of Wetland Surface Water
Extent with SAR Images using Knowledge Distillation [44.99833362998488]
We present DeepAqua, a self-supervised deep learning model that eliminates the need for manual annotations during the training phase.
We exploit cases where optical- and radar-based water masks coincide, enabling the detection of both open and vegetated water surfaces.
Experimental results show that DeepAqua outperforms other unsupervised methods by improving accuracy by 7%, Intersection Over Union by 27%, and F1 score by 14%.
arXiv Detail & Related papers (2023-05-02T18:06:21Z) - Adaptive deep learning framework for robust unsupervised underwater image enhancement [3.0516727053033392]
One of the main challenges in deep learning-based underwater image enhancement is the limited availability of high-quality training data.<n>We propose a novel unsupervised underwater image enhancement framework that employs a conditional variational autoencoder (cVAE) to train a deep learning model.<n>We show that our proposed framework yields competitive performance compared to other state-of-the-art approaches in quantitative as well as qualitative metrics.
arXiv Detail & Related papers (2022-12-18T01:07:20Z) - Semantic-aware Texture-Structure Feature Collaboration for Underwater
Image Enhancement [58.075720488942125]
Underwater image enhancement has become an attractive topic as a significant technology in marine engineering and aquatic robotics.
We develop an efficient and compact enhancement network in collaboration with a high-level semantic-aware pretrained model.
We also apply the proposed algorithm to the underwater salient object detection task to reveal the favorable semantic-aware ability for high-level vision tasks.
arXiv Detail & Related papers (2022-11-19T07:50:34Z) - Domain Adaptation for Underwater Image Enhancement via Content and Style
Separation [7.077978580799124]
Underwater image suffer from color cast, low contrast and hazy effect due to light absorption, refraction and scattering.
Recent learning-based methods demonstrate astonishing performance on underwater image enhancement.
We propose a domain adaptation framework for underwater image enhancement via content and style separation.
arXiv Detail & Related papers (2022-02-17T09:30:29Z) - Domain Adaptive Adversarial Learning Based on Physics Model Feedback for
Underwater Image Enhancement [10.143025577499039]
We propose a new robust adversarial learning framework via physics model based feedback control and domain adaptation mechanism for enhancing underwater images.
A new method for simulating underwater-like training dataset from RGB-D data by underwater image formation model is proposed.
Final enhanced results on synthetic and real underwater images demonstrate the superiority of the proposed method.
arXiv Detail & Related papers (2020-02-20T07:50:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.