SAMSON: 3rd Place Solution of LSVOS 2025 VOS Challenge
- URL: http://arxiv.org/abs/2509.17500v1
- Date: Mon, 22 Sep 2025 08:30:34 GMT
- Title: SAMSON: 3rd Place Solution of LSVOS 2025 VOS Challenge
- Authors: Yujie Xie, Hongyang Zhang, Zhihui Liu, Shihai Ruan,
- Abstract summary: Large-scale Video Object module (LSVOS) addresses the challenge of accurately tracking and segmenting objects in long video sequences.<n>Our method achieved a final performance of 0.8427 in terms of J &F in the test-set leaderboard.
- Score: 9.131199997701282
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large-scale Video Object Segmentation (LSVOS) addresses the challenge of accurately tracking and segmenting objects in long video sequences, where difficulties stem from object reappearance, small-scale targets, heavy occlusions, and crowded scenes. Existing approaches predominantly adopt SAM2-based frameworks with various memory mechanisms for complex video mask generation. In this report, we proposed Segment Anything with Memory Strengthened Object Navigation (SAMSON), the 3rd place solution in the MOSE track of ICCV 2025, which integrates the strengths of stateof-the-art VOS models into an effective paradigm. To handle visually similar instances and long-term object disappearance in MOSE, we incorporate a long-term memorymodule for reliable object re-identification. Additionly, we adopt SAM2Long as a post-processing strategy to reduce error accumulation and enhance segmentation stability in long video sequences. Our method achieved a final performance of 0.8427 in terms of J &F in the test-set leaderboard.
Related papers
- LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation [186.14566815158506]
This report presents an overview of the 7th Large-scale Video Object (LSVOS) Challenge held in conjunction with ICCV 2025.<n>The 2025 edition features a newly introduced track, Complex VOS (MOSEv2)<n>We summarize datasets and protocols, highlight top-performing solutions, and distill emerging trends.
arXiv Detail & Related papers (2025-10-13T07:02:09Z) - The 1st Solution for MOSEv1 Challenge on LSVOS 2025: CGFSeg [19.13013862040698]
Video Object (VOS) aims to track and segment specific objects across entire video sequences.<n>In this paper, we present our improved method, Confidence-Guided Fusion extraction (CGFSeg) for the VOS task in the MOSEv1 Challenge.<n>Our method achieves a J&F score of 86.37% on the test set, ranking 1st in the MOSEv1 Challenge at LSVOS 2025.
arXiv Detail & Related papers (2025-09-30T03:50:56Z) - 2nd Place Report of MOSEv2 Challenge 2025: Concept Guided Video Object Segmentation via SeC [46.76209037655681]
Semi-supervised Video Object aims to segment a specified target throughout a video sequence, by a first-frame mask.<n>SeC framework established a deep semantic understanding of the object for more persistent segmentation.<n>SeC achieved 39.7 JFn on the test set and ranked 2nd place in the Complex VOS track of the 7th Large-scale Video Object Challenge.
arXiv Detail & Related papers (2025-09-28T12:26:03Z) - The 1st Solution for MOSEv2 Challenge 2025: Long-term and Concept-aware Video Segmentation via SeC [59.53390730730018]
Solution achieves a JF score of 39.89% on the test set, ranking 1st in the MOSEv2 track of the LSVOS Challenge.
arXiv Detail & Related papers (2025-09-23T15:58:13Z) - Enriched Feature Representation and Motion Prediction Module for MOSEv2 Track of 7th LSVOS Challenge: 3rd Place Solution [8.540105031750434]
We propose a framework that integrates the strengths of Cutie and SAM2.<n>We achieve 3rd place in the MOSEv2 track of the 7th LSVOS Challenge.<n>This demonstrates the effectiveness of enriched feature representation and motion prediction for robust video object segmentation.
arXiv Detail & Related papers (2025-09-19T09:11:01Z) - HQ-SMem: Video Segmentation and Tracking Using Memory Efficient Object Embedding With Selective Update and Self-Supervised Distillation Feedback [0.0]
We introduce HQ-SMem, for High Quality video segmentation and tracking using Smart Memory.<n>Our approach incorporates three key innovations: (i) leveraging SAM with High-Quality masks (SAM-HQ) alongside appearance-based candidate-selection to refine coarse segmentation masks, resulting in improved object boundaries; (ii) implementing a dynamic smart memory mechanism that selectively stores relevant key frames while discarding redundant ones; and (iii) dynamically updating the appearance model to effectively handle complex topological object variations and reduce drift throughout the video.
arXiv Detail & Related papers (2025-07-25T03:28:05Z) - Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation [51.2732688481343]
We introduce Longtextbf-RVOS, a large-scale benchmark for long-term referring object segmentation.<n>Long-RVOS contains 2,000+ videos of an average duration exceeding 60 seconds, covering a variety of objects.<n>Unlike previous benchmarks that rely solely on the per-frame spatial evaluation, we introduce two metrics to assess the temporal andtemporal consistency.
arXiv Detail & Related papers (2025-05-19T04:52:31Z) - MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection [21.22536962888316]
We present MoSAM, incorporating two key strategies to integrate object motion cues into the model and establish more reliable feature memory.<n>MoSAM achieves state-of-the-art results compared to other competitors.
arXiv Detail & Related papers (2025-04-30T02:19:31Z) - SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree [79.26409013413003]
We introduce SAM2Long, an improved training-free video object segmentation strategy.<n>It considers the segmentation uncertainty within each frame and chooses the video-level optimal results from multiple segmentation pathways.<n> SAM2Long achieves an average improvement of 3.0 points across all 24 head-to-head comparisons.
arXiv Detail & Related papers (2024-10-21T17:59:19Z) - 3rd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation [63.199793919573295]
Video Object (VOS) is a vital task in computer vision, focusing on distinguishing foreground objects from the background across video frames.
Our work draws inspiration from the Cutie model, and we investigate the effects of object memory, the total number of memory frames, and input resolution on segmentation performance.
arXiv Detail & Related papers (2024-06-06T00:56:25Z) - MOSE: A New Dataset for Video Object Segmentation in Complex Scenes [106.64327718262764]
Video object segmentation (VOS) aims at segmenting a particular object throughout the entire video clip sequence.
The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% J&F) on existing datasets.
We collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex environments.
arXiv Detail & Related papers (2023-02-03T17:20:03Z) - Scalable Video Object Segmentation with Identification Mechanism [125.4229430216776]
This paper explores the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object (VOS)
We present two innovative approaches, Associating Objects with Transformers (AOT) and Associating Objects with Scalable Transformers (AOST)
Our approaches surpass the state-of-the-art competitors and display exceptional efficiency and scalability consistently across all six benchmarks.
arXiv Detail & Related papers (2022-03-22T03:33:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.