Transformer-based Self-Supervised Fish Segmentation in Underwater Videos
- URL: http://arxiv.org/abs/2206.05390v1
- Date: Sat, 11 Jun 2022 01:20:48 GMT
- Title: Transformer-based Self-Supervised Fish Segmentation in Underwater Videos
- Authors: Alzayat Saleh, Marcus Sheaves, Dean Jerry, and Mostafa Rahimi Azghadi
- Abstract summary: We introduce a Transformer-based method that uses self-supervision for high-quality fish segmentation.
We show that when trained on a set of underwater videos from one dataset, the proposed model surpasses previous CNN-based and Transformer-based self-supervised methods.
- Score: 1.9249287163937976
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Underwater fish segmentation to estimate fish body measurements is still
largely unsolved due to the complex underwater environment. Relying on
fully-supervised segmentation models requires collecting per-pixel labels,
which is time-consuming and prone to overfitting. Self-supervised learning
methods can help avoid the requirement of large annotated training datasets,
however, to be useful in real-world applications, they should achieve good
segmentation quality. In this paper, we introduce a Transformer-based method
that uses self-supervision for high-quality fish segmentation. Our proposed
model is trained on videos -- without any annotations -- to perform fish
segmentation in underwater videos taken in situ in the wild. We show that when
trained on a set of underwater videos from one dataset, the proposed model
surpasses previous CNN-based and Transformer-based self-supervised methods and
achieves performance relatively close to supervised methods on two new unseen
underwater video datasets. This demonstrates the great generalisability of our
model and the fact that it does not need a pre-trained model. In addition, we
show that, due to its dense representation learning, our model is
compute-efficient. We provide quantitative and qualitative results that
demonstrate our model's significant capabilities.
Related papers
- Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset [60.14089302022989]
Underwater vision tasks often suffer from low segmentation accuracy due to the complex underwater circumstances.
We construct the first large-scale underwater salient instance segmentation dataset (USIS10K)
We propose an Underwater Salient Instance architecture based on Segment Anything Model (USIS-SAM) specifically for the underwater domain.
arXiv Detail & Related papers (2024-06-10T06:17:33Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - TempNet: Temporal Attention Towards the Detection of Animal Behaviour in
Videos [63.85815474157357]
We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos.
TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder.
We demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events.
arXiv Detail & Related papers (2022-11-17T23:55:12Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - iBoot: Image-bootstrapped Self-Supervised Video Representation Learning [45.845595749486215]
Video self-supervised learning (SSL) suffers from added challenges: video datasets are typically not as large as image datasets.
We propose to utilize a strong image-based model, pre-trained with self- or language supervision, in a video representation learning framework.
The proposed algorithm is shown to learn much more efficiently in less epochs and with a smaller batch.
arXiv Detail & Related papers (2022-06-16T17:42:48Z) - ViViT: A Video Vision Transformer [75.74690759089529]
We present pure-transformer based models for video classification.
Our model extracts-temporal tokens from the input video, which are then encoded by a series of transformer layers.
We show how we can effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets.
arXiv Detail & Related papers (2021-03-29T15:27:17Z) - A Realistic Fish-Habitat Dataset to Evaluate Algorithms for Underwater
Visual Analysis [2.6476746128312194]
We present DeepFish as a benchmark suite with a large-scale dataset to train and test methods for several computer vision tasks.
The dataset consists of approximately 40 thousand images collected underwater from 20 greenhabitats in the marine-environments of tropical Australia.
Our experiments provide an in-depth analysis of the dataset characteristics, and the performance evaluation of several state-of-the-art approaches.
arXiv Detail & Related papers (2020-08-28T12:20:59Z) - Deep Learning based Segmentation of Fish in Noisy Forward Looking MBES
Images [1.5469452301122177]
We build on recent advances in Deep Learning (DL) and Convolutional Neural Networks (CNNs) for semantic segmentation.
We demonstrate an end-to-end approach for a fish/non-fish probability prediction for all range-azimuth positions projected by an imaging sonar.
We show that our model proves the desired performance and has learned to harness the importance of semantic context.
arXiv Detail & Related papers (2020-06-16T09:57:38Z) - Semantic Segmentation of Underwater Imagery: Dataset and Benchmark [13.456412091502527]
We present the first large-scale dataset for semantic analysis of Underwater IMagery (SUIM)
It contains over 1500 images with pixel annotations for eight object categories: fish (vertebrates), reefs (invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and sea-floor.
We also present a benchmark evaluation of state-of-the-art semantic segmentation approaches based on standard performance metrics.
arXiv Detail & Related papers (2020-04-02T19:53:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.