Pseudo-Label Enhanced Cascaded Framework: 2nd Technical Report for LSVOS 2025 VOS Track
- URL: http://arxiv.org/abs/2509.14901v1
- Date: Thu, 18 Sep 2025 12:23:51 GMT
- Title: Pseudo-Label Enhanced Cascaded Framework: 2nd Technical Report for LSVOS 2025 VOS Track
- Authors: An Yan, Leilei Cao, Feng Lu, Ran Hong, Youhai Jiang, Fengjie Zhu,
- Abstract summary: Complex Video Object (VOS) presents significant challenges in accurately segmenting objects across frames.<n>We present our solution for the LSVOS 2025 VOS Track based on the SAM2 framework.<n>We achieve a J&F score of 0.8616 on the MOSE test set -- +1.4 points over our SAM2Long baseline -- securing the 2nd place in the LSVOS 2025 VOS Track.
- Score: 11.068687286561177
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Complex Video Object Segmentation (VOS) presents significant challenges in accurately segmenting objects across frames, especially in the presence of small and similar targets, frequent occlusions, rapid motion, and complex interactions. In this report, we present our solution for the LSVOS 2025 VOS Track based on the SAM2 framework. We adopt a pseudo-labeling strategy during training: a trained SAM2 checkpoint is deployed within the SAM2Long framework to generate pseudo labels for the MOSE test set, which are then combined with existing data for further training. For inference, the SAM2Long framework is employed to obtain our primary segmentation results, while an open-source SeC model runs in parallel to produce complementary predictions. A cascaded decision mechanism dynamically integrates outputs from both models, exploiting the temporal stability of SAM2Long and the concept-level robustness of SeC. Benefiting from pseudo-label training and cascaded multi-model inference, our approach achieves a J\&F score of 0.8616 on the MOSE test set -- +1.4 points over our SAM2Long baseline -- securing the 2nd place in the LSVOS 2025 VOS Track, and demonstrating strong robustness and accuracy in long, complex video segmentation scenarios.
Related papers
- SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking [15.279735515011817]
Surgical segmentation is crucial for computer-assisted surgery, enabling precise localization and tracking of instruments and tissues.<n>Interactive Video Object (iVOS) models such as Segment Anything Model 2 (SAM2) provide prompt-based flexibility beyond methods with predefined categories, but face challenges in surgical scenarios due to the domain gap and limited long-term tracking.<n>We construct SA-SV, the largest surgical iVOS benchmark with instance-level temporal annotations (masklets) spanning eight procedure types (61k frames, 1.6k masklets)<n>We propose SAM2S, a foundation model enhancing bftextSAM2 for
arXiv Detail & Related papers (2025-11-20T18:18:49Z) - Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2 [48.71856814549096]
We propose UAP-SAM2, the first cross-prompt universal adversarial attack against SAM2 driven by dual semantic deviation.<n>We show that UAP-SAM2 significantly outperforms state-of-the-art (SOTA) attacks by a large margin.
arXiv Detail & Related papers (2025-10-28T08:59:11Z) - LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation [186.14566815158506]
This report presents an overview of the 7th Large-scale Video Object (LSVOS) Challenge held in conjunction with ICCV 2025.<n>The 2025 edition features a newly introduced track, Complex VOS (MOSEv2)<n>We summarize datasets and protocols, highlight top-performing solutions, and distill emerging trends.
arXiv Detail & Related papers (2025-10-13T07:02:09Z) - The 1st Solution for MOSEv2 Challenge 2025: Long-term and Concept-aware Video Segmentation via SeC [59.53390730730018]
Solution achieves a JF score of 39.89% on the test set, ranking 1st in the MOSEv2 track of the LSVOS Challenge.
arXiv Detail & Related papers (2025-09-23T15:58:13Z) - SAMSON: 3rd Place Solution of LSVOS 2025 VOS Challenge [9.131199997701282]
Large-scale Video Object module (LSVOS) addresses the challenge of accurately tracking and segmenting objects in long video sequences.<n>Our method achieved a final performance of 0.8427 in terms of J &F in the test-set leaderboard.
arXiv Detail & Related papers (2025-09-22T08:30:34Z) - Seg2Track-SAM2: SAM2-based Multi-object Tracking and Segmentation for Zero-shot Generalization [3.108551551357326]
Seg2Track-SAM2 is a framework that integrates pre-trained object detectors with SAM2 and a novel Seg2Track module.<n>Seg2Track-SAM2 achieves state-of-the-art (SOTA) performance, ranking fourth overall in both car and pedestrian classes on KITTI MOTS.<n>Results confirm that Seg2Track-SAM2 advances MOTS by combining robust zero-shot tracking, enhanced identity preservation, and efficient memory utilization.
arXiv Detail & Related papers (2025-09-15T10:52:27Z) - SAM2-UNeXT: An Improved High-Resolution Baseline for Adapting Foundation Models to Downstream Segmentation Tasks [50.97089872043121]
We propose SAM2-UNeXT, an advanced framework that builds upon the core principles of SAM2-UNet.<n>We extend the representational capacity of SAM2 through the integration of an auxiliary DINOv2 encoder.<n>Our approach enables more accurate segmentation with a simple architecture, relaxing the need for complex decoder designs.
arXiv Detail & Related papers (2025-08-05T15:36:13Z) - SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation [11.1906749425206]
Segment Anything 2 (SAM2) enables robust single-object tracking using segmentation.<n>We propose SAM2MOT, a novel Tracking by paradigm for multi-object tracking.<n> SAM2MOT directly generates tracking boxes from segmentation masks, reducing reliance on detection accuracy.
arXiv Detail & Related papers (2025-04-06T15:32:08Z) - Det-SAM2:Technical Report on the Self-Prompting Segmentation Framework Based on Segment Anything Model 2 [0.0]
This report focuses on the construction of the overall Det-SAM2 framework and the subsequent engineering optimization applied to SAM2.<n>We present a case demonstrating an application built on the Det-SAM2 framework: AI refereeing in a billiards scenario, derived from our business context.
arXiv Detail & Related papers (2024-11-28T07:58:30Z) - Underwater Camouflaged Object Tracking Meets Vision-Language SAM2 [60.47622353256502]
We propose the first large-scale multi-modal underwater camouflaged object tracking dataset, namely UW-COT220.<n>Based on the proposed dataset, this work first evaluates current advanced visual object tracking methods, including SAM- and SAM2-based trackers, in challenging underwater environments.<n>Our findings highlight the improvements of SAM2 over SAM, demonstrating its enhanced ability to handle the complexities of underwater camouflaged objects.
arXiv Detail & Related papers (2024-09-25T13:10:03Z) - Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments.
We propose UOIS-SAM, a data-efficient solution for the UOIS task.
UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z) - The 2nd Solution for LSVOS Challenge RVOS Track: Spatial-temporal Refinement for Consistent Semantic Segmentation [0.0]
We propose a method to enhance the temporal consistency of the referring object segmentation model.
Our method placed 2nd in the final ranking of the RVOS Track at the ECCV 2024 LSVOS Challenge.
arXiv Detail & Related papers (2024-08-22T14:43:02Z) - Stable Segment Anything Model [79.9005670886038]
The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts.
This paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities.
Our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality.
arXiv Detail & Related papers (2023-11-27T12:51:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.