Underwater Camouflaged Object Tracking Meets Vision-Language SAM2
- URL: http://arxiv.org/abs/2409.16902v4
- Date: Mon, 28 Apr 2025 22:11:02 GMT
- Title: Underwater Camouflaged Object Tracking Meets Vision-Language SAM2
- Authors: Chunhui Zhang, Li Liu, Guanjie Huang, Zhipeng Zhang, Hao Wen, Xi Zhou, Shiming Ge, Yanfeng Wang,
- Abstract summary: We propose the first large-scale multi-modal underwater camouflaged object tracking dataset, namely UW-COT220.<n>Based on the proposed dataset, this work first evaluates current advanced visual object tracking methods, including SAM- and SAM2-based trackers, in challenging underwater environments.<n>Our findings highlight the improvements of SAM2 over SAM, demonstrating its enhanced ability to handle the complexities of underwater camouflaged objects.
- Score: 60.47622353256502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over the past decade, significant progress has been made in visual object tracking, largely due to the availability of large-scale datasets. However, these datasets have primarily focused on open-air scenarios and have largely overlooked underwater animal tracking-especially the complex challenges posed by camouflaged marine animals. To bridge this gap, we take a step forward by proposing the first large-scale multi-modal underwater camouflaged object tracking dataset, namely UW-COT220. Based on the proposed dataset, this work first comprehensively evaluates current advanced visual object tracking methods, including SAM- and SAM2-based trackers, in challenging underwater environments, \eg, coral reefs. Our findings highlight the improvements of SAM2 over SAM, demonstrating its enhanced ability to handle the complexities of underwater camouflaged objects. Furthermore, we propose a novel vision-language tracking framework called VL-SAM2, based on the video foundation model SAM2. Experimental results demonstrate that our VL-SAM2 achieves state-of-the-art performance on the UW-COT220 dataset. The dataset and codes are available at~\href{https://github.com/983632847/Awesome-Multimodal-Object-Tracking}{\color{magenta}{here}}.
Related papers
- CamSAM2: Segment Anything Accurately in Camouflaged Videos [37.0152845263844]
We propose Camouflaged SAM2 (CamSAM2) to handle camouflaged scenes without modifying SAM2's parameters.
To make full use of fine-grained and high-resolution features from the current frame and previous frames, we propose implicit object-aware fusion (IOF) and explicit object-aware fusion (EOF) modules.
While CamSAM2 only adds negligible learnable parameters to SAM2, it substantially outperforms SAM2 on three VCOS datasets.
arXiv Detail & Related papers (2025-03-25T14:58:52Z) - When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation [36.174458990817165]
This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS)
VCOS involves detecting objects that blend seamlessly in the surroundings for videos, due to similar colors and textures, poor light conditions, etc.
arXiv Detail & Related papers (2024-09-27T11:35:50Z) - Camouflaged Object Tracking: A Benchmark [16.07670491479613]
We introduce the Camouflaged Object Tracking dataset (COTD), a benchmark for evaluating camouflaged object tracking methods.
COTD comprises 200 sequences and approximately 80,000 frames, each annotated with detailed bounding boxes.
Our evaluation of 20 existing tracking algorithms reveals significant deficiencies in their performance with camouflaged objects.
We propose a novel tracking framework, HiPTrack-MLS, which demonstrates promising results in improving tracking performance for camouflaged objects.
arXiv Detail & Related papers (2024-08-25T15:56:33Z) - Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track [28.52754012142431]
Segment Anything Model 2 (SAM 2) is a foundation model towards solving promptable visual segmentation in images and videos.
SAM 2 builds a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date.
Without fine-tuning on the training set, SAM 2 achieved 75.79 J&F on the test set and ranked 4th place for 6th LSVOS Challenge VOS Track.
arXiv Detail & Related papers (2024-08-19T16:13:14Z) - Evaluation of Segment Anything Model 2: The Role of SAM2 in the Underwater Environment [2.0554501265326794]
The Segment Anything Model (SAM) and its extensions have been attempted for applications in various underwater visualization tasks in marine sciences.
Recently, Meta has developed the Segment Anything Model 2 (SAM2), which significantly improves running speed and segmentation accuracy.
This report aims to explore the potential of SAM2 in marine science by evaluating it on the underwater instance segmentation datasets benchmark UIIS and USIS10K.
arXiv Detail & Related papers (2024-08-06T03:20:10Z) - MAS-SAM: Segment Any Marine Animal with Aggregated Features [55.91291540810978]
We propose a novel feature learning framework named MAS-SAM for marine animal segmentation.
Our method enables to extract richer marine information from global contextual cues to fine-grained local details.
arXiv Detail & Related papers (2024-04-24T07:38:14Z) - Moving Object Segmentation: All You Need Is SAM (and Flow) [82.78026782967959]
We investigate two models for combining SAM with optical flow that harness the segmentation power of SAM with the ability of flow to discover and group moving objects.
In the first model, we adapt SAM to take optical flow, rather than RGB, as an input. In the second, SAM takes RGB as an input, and flow is used as a segmentation prompt.
These surprisingly simple methods, without any further modifications, outperform all previous approaches by a considerable margin in both single and multi-object benchmarks.
arXiv Detail & Related papers (2024-04-18T17:59:53Z) - Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM [62.85895749882285]
Marine Animal (MAS) involves segmenting animals within marine environments.
We propose a novel feature learning framework, named Dual-SAM for high-performance MAS.
Our proposed method achieves state-of-the-art performances on five widely-used MAS datasets.
arXiv Detail & Related papers (2024-04-07T15:34:40Z) - Improving Underwater Visual Tracking With a Large Scale Dataset and
Image Enhancement [70.2429155741593]
This paper presents a new dataset and general tracker enhancement method for Underwater Visual Object Tracking (UVOT)
It poses distinct challenges; the underwater environment exhibits non-uniform lighting conditions, low visibility, lack of sharpness, low contrast, camouflage, and reflections from suspended particles.
We propose a novel underwater image enhancement algorithm designed specifically to boost tracking quality.
The method has resulted in a significant performance improvement, of up to 5.0% AUC, of state-of-the-art (SOTA) visual trackers.
arXiv Detail & Related papers (2023-08-30T07:41:26Z) - BrackishMOT: The Brackish Multi-Object Tracking Dataset [20.52569822945148]
There exist no publicly available annotated underwater multi-object tracking (MOT) datasets captured in turbid environments.
BrackishMOT consists of 98 sequences captured in the wild. Alongside the novel dataset, we present baseline results by training a state-of-the-art tracker.
We analyse the effects of including synthetic data during training and show that a combination of real and synthetic underwater training data can enhance tracking performance.
arXiv Detail & Related papers (2023-02-21T13:02:36Z) - AVisT: A Benchmark for Visual Object Tracking in Adverse Visibility [125.77396380698639]
AVisT is a benchmark for visual tracking in diverse scenarios with adverse visibility.
AVisT comprises 120 challenging sequences with 80k annotated frames, spanning 18 diverse scenarios.
We benchmark 17 popular and recent trackers on AVisT with detailed analysis of their tracking performance across attributes.
arXiv Detail & Related papers (2022-08-14T17:49:37Z) - SVAM: Saliency-guided Visual Attention Modeling by Autonomous Underwater
Robots [16.242924916178282]
This paper presents a holistic approach to saliency-guided visual attention modeling (SVAM) for use by autonomous underwater robots.
Our proposed model, named SVAM-Net, integrates deep visual features at various scales and semantics for effective salient object detection (SOD) in natural underwater images.
arXiv Detail & Related papers (2020-11-12T08:17:21Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.