ZS-VCOS: Zero-Shot Video Camouflaged Object Segmentation By Optical Flow and Open Vocabulary Object Detection
- URL: http://arxiv.org/abs/2505.01431v2
- Date: Fri, 18 Jul 2025 23:46:11 GMT
- Title: ZS-VCOS: Zero-Shot Video Camouflaged Object Segmentation By Optical Flow and Open Vocabulary Object Detection
- Authors: Wenqi Guo, Mohamed Shehata, Shan Du,
- Abstract summary: This work studies how to avoid training by integrating large pre-trained models like SAM-2 and Owl-v2 with temporal information into a modular pipeline.<n>Our approach also surpasses supervised methods, increasing the F-measure from 0.476 to 0.628.
- Score: 7.457821910654639
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camouflaged object segmentation presents unique challenges compared to traditional segmentation tasks, primarily due to the high similarity in patterns and colors between camouflaged objects and their backgrounds. Effective solutions to this problem have significant implications in critical areas such as pest control, defect detection, and lesion segmentation in medical imaging. Prior research has predominantly emphasized supervised or unsupervised pre-training methods, leaving zero-shot approaches significantly underdeveloped. Existing zero-shot techniques commonly utilize the Segment Anything Model (SAM) in automatic mode or rely on vision-language models to generate cues for segmentation; however, their performances remain unsatisfactory, due to the similarity of the camouflaged object and the background. This work studies how to avoid training by integrating large pre-trained models like SAM-2 and Owl-v2 with temporal information into a modular pipeline. Evaluated on the MoCA-Mask dataset, our approach achieves outstanding performance improvements, significantly outperforming existing zero-shot methods by raising the F-measure ($F_\beta^w$) from 0.296 to 0.628. Our approach also surpasses supervised methods, increasing the F-measure from 0.476 to 0.628. Additionally, evaluation on the MoCA-Filter dataset demonstrates an increase in the success rate from 0.628 to 0.697 when compared with FlowSAM, a supervised transfer method. A thorough ablation study further validates the individual contributions of each component. Besides our main contributions, we also highlight inconsistencies in previous work regarding metrics and settings. Code can be found in https://github.com/weathon/vcos.
Related papers
- Segment Concealed Objects with Incomplete Supervision [63.637733655439334]
Incompletely-Supervised Concealed Object (ISCOS) involves segmenting objects that seamlessly blend into their surrounding environments.<n>This task remains highly challenging due to the limited supervision provided by the incompletely annotated training data.<n>In this paper, we introduce the first unified method for ISCOS to address these challenges.
arXiv Detail & Related papers (2025-06-10T16:25:15Z) - Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning [63.55145330447408]
We propose a novel textbfSelf-textbfPerceptinon textbfTuning (textbfSPT) method for anomaly segmentation.<n>The SPT method incorporates a self-drafting tuning strategy, which generates an initial coarse draft of the anomaly mask, followed by a refinement process.
arXiv Detail & Related papers (2024-11-26T08:33:25Z) - Deep Unlearn: Benchmarking Machine Unlearning for Image Classification [7.450700594277741]
Machine unlearning (MU) aims to remove the influence of particular data points from the learnable parameters of a trained machine learning model.<n>This paper investigates 18 state-of-the-art MU methods across various benchmark datasets and models.
arXiv Detail & Related papers (2024-10-02T06:41:58Z) - Unleashing the Power of Generic Segmentation Models: A Simple Baseline for Infrared Small Target Detection [57.666055329221194]
We investigate the adaptation of generic segmentation models, such as the Segment Anything Model (SAM), to infrared small object detection tasks.
Our model demonstrates significantly improved performance in both accuracy and throughput compared to existing approaches.
arXiv Detail & Related papers (2024-09-07T05:31:24Z) - Unsupervised Fault Detection using SAM with a Moving Window Approach [0.0]
We present an unsupervised method that uses the high end Segment Anything Model (SAM) and a moving window approach.
We aim to overcome these challenges without requiring fine tun ing or labeled data.
arXiv Detail & Related papers (2024-07-08T18:12:29Z) - NubbleDrop: A Simple Way to Improve Matching Strategy for Prompted One-Shot Segmentation [2.2559617939136505]
We propose a simple and training-free method to enhance the validity and robustness of the matching strategy.
The core concept involves randomly dropping feature channels (setting them to zero) during the matching process.
This technique mimics discarding pathological nubbles, and it can be seamlessly applied to other similarity computing scenarios.
arXiv Detail & Related papers (2024-05-19T08:00:38Z) - Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.<n>We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised
Learning for Robust Infrared Small Target Detection [53.19618419772467]
Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds.
With the development of Transformer, the scale of SIRST models is constantly increasing.
With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed.
arXiv Detail & Related papers (2024-03-08T16:14:54Z) - Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z) - USD: Unknown Sensitive Detector Empowered by Decoupled Objectness and
Segment Anything Model [14.080744645704751]
Open World Object Detection (OWOD) is a novel and challenging computer vision task.
We propose a simple yet effective learning strategy, namely Decoupled Objectness Learning (DOL), which divides the learning of these two boundaries into decoder layers.
We also introduce an Auxiliary Supervision Framework (ASF) that uses a pseudo-labeling and a soft-weighting strategies to alleviate the negative impact of noise.
arXiv Detail & Related papers (2023-06-04T06:42:09Z) - CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion
Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings.
We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models.
Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z) - Siamese Prototypical Contrastive Learning [24.794022951873156]
Contrastive Self-supervised Learning (CSL) is a practical solution that learns meaningful visual representations from massive data in an unsupervised approach.
In this paper, we tackle this problem by introducing a simple but effective contrastive learning framework.
The key insight is to employ siamese-style metric loss to match intra-prototype features, while increasing the distance between inter-prototype features.
arXiv Detail & Related papers (2022-08-18T13:25:30Z) - Scale-Equivalent Distillation for Semi-Supervised Object Detection [57.59525453301374]
Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals.
We analyze the challenges these methods meet with the empirical experiment results.
We introduce a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
arXiv Detail & Related papers (2022-03-23T07:33:37Z) - Adversarial Detection without Model Information [4.4890837610757695]
We propose a model independent adversarial detection method using a simple energy function.
We train a standalone detector independent of the underlying model, with sequential layer-wise training to increase the energy separation corresponding to natural and adversarial inputs.
Our method achieves state-of-the-art detection performance (ROC-AUC > 0.9) across a wide range of gradient, score and decision-based adversarial attacks.
arXiv Detail & Related papers (2022-02-09T04:38:16Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.