Related papers: Closer to Ground Truth: Realistic Shape and Appearance Labeled Data Generation for Unsupervised Underwater Image Segmentation

Closer to Ground Truth: Realistic Shape and Appearance Labeled Data Generation for Unsupervised Underwater Image Segmentation

URL: http://arxiv.org/abs/2503.16051v1
Date: Thu, 20 Mar 2025 11:34:45 GMT
Title: Closer to Ground Truth: Realistic Shape and Appearance Labeled Data Generation for Unsupervised Underwater Image Segmentation
Authors: Andrei Jelea, Ahmed Nabil Belbachir, Marius Leordeanu,
Abstract summary: We introduce a novel two stage unsupervised segmentation approach that requires no human annotations and combines artificially created and real images.<n>Our method generates challenging synthetic training data, by placing virtual fish in real-world underwater habitats.<n>We show its effectiveness on the specific case of salmon segmentation in underwater videos, for which we introduce DeepSalmon, the largest dataset of its kind in the literature (30 GB)
Score: 8.511846002129522
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Solving fish segmentation in underwater videos, a real-world problem of great practical value in marine and aquaculture industry, is a challenging task due to the difficulty of the filming environment, poor visibility and limited existing annotated underwater fish data. In order to overcome these obstacles, we introduce a novel two stage unsupervised segmentation approach that requires no human annotations and combines artificially created and real images. Our method generates challenging synthetic training data, by placing virtual fish in real-world underwater habitats, after performing fish transformations such as Thin Plate Spline shape warping and color Histogram Matching, which realistically integrate synthetic fish into the backgrounds, making the generated images increasingly closer to the real world data with every stage of our approach. While we validate our unsupervised method on the popular DeepFish dataset, obtaining a performance close to a fully-supervised SoTA model, we further show its effectiveness on the specific case of salmon segmentation in underwater videos, for which we introduce DeepSalmon, the largest dataset of its kind in the literature (30 GB). Moreover, on both datasets we prove the capability of our approach to boost the performance of the fully-supervised SoTA model.

Related papers

Pseudo-Label Guided Real-World Image De-weathering: A Learning Framework with Imperfect Supervision [57.5699142476311]
We propose a unified solution for real-world image de-weathering with non-ideal supervision. Our method exhibits significant advantages when trained on imperfectly aligned de-weathering datasets.
arXiv Detail & Related papers (2025-04-14T07:24:03Z)
Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments [57.59857784298534]
We propose an integrated pipeline that combines Visual Place Recognition (VPR), feature matching, and image segmentation on video-derived images.<n>This method enables robust identification of revisited areas, estimation of rigid transformations, and downstream analysis of ecosystem changes.
arXiv Detail & Related papers (2025-03-06T05:13:19Z)
AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis [40.27548815196493]
We introduce AquaticCLIP, a novel contrastive language-image pre-training model tailored for aquatic scene understanding.<n> AquaticCLIP presents a new unsupervised learning framework that aligns images and texts in aquatic environments.<n>Our model sets a new benchmark for vision-language applications in underwater environments.
arXiv Detail & Related papers (2025-02-03T19:56:16Z)
FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation [65.01601309903971]
We introduce FAFA, a Frequency-Aware Flow-Aided self-supervised framework for 6D pose estimation of unmanned underwater vehicles (UUVs) Our framework relies solely on the 3D model and RGB images, alleviating the need for any real pose annotations or other-modality data like depths. We evaluate the effectiveness of FAFA on common underwater object pose benchmarks and showcase significant performance improvements compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-09-25T03:54:01Z)
Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset [60.14089302022989]
Underwater vision tasks often suffer from low segmentation accuracy due to the complex underwater circumstances. We construct the first large-scale underwater salient instance segmentation dataset (USIS10K) We propose an Underwater Salient Instance architecture based on Segment Anything Model (USIS-SAM) specifically for the underwater domain.
arXiv Detail & Related papers (2024-06-10T06:17:33Z)
Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion [30.122666238416716]
We propose a novel pipeline for generating underwater images using accurate terrestrial depth data. This approach facilitates the training of supervised models for underwater depth estimation. We introduce a unique Depth2Underwater ControlNet, trained on specially prepared Underwater, Depth, Text data triplets.
arXiv Detail & Related papers (2023-12-19T08:56:33Z)
Improving Underwater Visual Tracking With a Large Scale Dataset and Image Enhancement [70.2429155741593]
This paper presents a new dataset and general tracker enhancement method for Underwater Visual Object Tracking (UVOT) It poses distinct challenges; the underwater environment exhibits non-uniform lighting conditions, low visibility, lack of sharpness, low contrast, camouflage, and reflections from suspended particles. We propose a novel underwater image enhancement algorithm designed specifically to boost tracking quality. The method has resulted in a significant performance improvement, of up to 5.0% AUC, of state-of-the-art (SOTA) visual trackers.
arXiv Detail & Related papers (2023-08-30T07:41:26Z)
DeepAqua: Self-Supervised Semantic Segmentation of Wetland Surface Water Extent with SAR Images using Knowledge Distillation [44.99833362998488]
We present DeepAqua, a self-supervised deep learning model that eliminates the need for manual annotations during the training phase. We exploit cases where optical- and radar-based water masks coincide, enabling the detection of both open and vegetated water surfaces. Experimental results show that DeepAqua outperforms other unsupervised methods by improving accuracy by 7%, Intersection Over Union by 27%, and F1 score by 14%.
arXiv Detail & Related papers (2023-05-02T18:06:21Z)
Bridging the Gap to Real-World Object-Centric Learning [66.55867830853803]
We show that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way. Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data.
arXiv Detail & Related papers (2022-09-29T15:24:47Z)
How to Track and Segment Fish without Human Annotations: A Self-Supervised Deep Learning Approach [3.0516727053033392]
Training deep neural networks (DNNs) for fish tracking and segmentation requires high-quality labels. We propose an unsupervised approach that relies on spatial and temporal variations in video data to generate noisy pseudo-ground-truth labels. Our framework consists of three stages: (1) an optical flow model generates the pseudo labels using spatial and temporal consistency between frames, (2) a self-supervised model refines the pseudo-labels incrementally, and (3) a segmentation network uses the refined labels for training.
arXiv Detail & Related papers (2022-08-23T01:01:27Z)
Overcoming Annotation Bottlenecks in Underwater Fish Segmentation: A Robust Self-Supervised Learning Approach [3.0516727053033392]
This paper introduces a novel self-supervised learning approach for fish segmentation using Deep Learning.<n>Our model, trained without manual annotation, learns robust and generalizable representations by aligning features across augmented views.<n>We demonstrate its effectiveness on three challenging underwater video datasets: DeepFish, Seagrass, and YouTube-VOS.
arXiv Detail & Related papers (2022-06-11T01:20:48Z)
A Realistic Fish-Habitat Dataset to Evaluate Algorithms for Underwater Visual Analysis [2.6476746128312194]
We present DeepFish as a benchmark suite with a large-scale dataset to train and test methods for several computer vision tasks. The dataset consists of approximately 40 thousand images collected underwater from 20 greenhabitats in the marine-environments of tropical Australia. Our experiments provide an in-depth analysis of the dataset characteristics, and the performance evaluation of several state-of-the-art approaches.
arXiv Detail & Related papers (2020-08-28T12:20:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.