The 1st Solution for MOSEv2 Challenge 2025: Long-term and Concept-aware Video Segmentation via SeC
- URL: http://arxiv.org/abs/2509.19183v1
- Date: Tue, 23 Sep 2025 15:58:13 GMT
- Title: The 1st Solution for MOSEv2 Challenge 2025: Long-term and Concept-aware Video Segmentation via SeC
- Authors: Mingqi Gao, Jingkun Chen, Yunqi Miao, Gengshen Wu, Zhijin Qin, Jungong Han,
- Abstract summary: Solution achieves a JF score of 39.89% on the test set, ranking 1st in the MOSEv2 track of the LSVOS Challenge.
- Score: 59.53390730730018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technical report explores the MOSEv2 track of the LSVOS Challenge, which targets complex semi-supervised video object segmentation. By analysing and adapting SeC, an enhanced SAM-2 framework, we conduct a detailed study of its long-term memory and concept-aware memory, showing that long-term memory preserves temporal continuity under occlusion and reappearance, while concept-aware memory supplies semantic priors that suppress distractors; together, these traits directly benefit several MOSEv2's core challenges. Our solution achieves a JF score of 39.89% on the test set, ranking 1st in the MOSEv2 track of the LSVOS Challenge.
Related papers
- LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation [186.14566815158506]
This report presents an overview of the 7th Large-scale Video Object (LSVOS) Challenge held in conjunction with ICCV 2025.<n>The 2025 edition features a newly introduced track, Complex VOS (MOSEv2)<n>We summarize datasets and protocols, highlight top-performing solutions, and distill emerging trends.
arXiv Detail & Related papers (2025-10-13T07:02:09Z) - The 1st Solution for MOSEv1 Challenge on LSVOS 2025: CGFSeg [19.13013862040698]
Video Object (VOS) aims to track and segment specific objects across entire video sequences.<n>In this paper, we present our improved method, Confidence-Guided Fusion extraction (CGFSeg) for the VOS task in the MOSEv1 Challenge.<n>Our method achieves a J&F score of 86.37% on the test set, ranking 1st in the MOSEv1 Challenge at LSVOS 2025.
arXiv Detail & Related papers (2025-09-30T03:50:56Z) - 2nd Place Report of MOSEv2 Challenge 2025: Concept Guided Video Object Segmentation via SeC [46.76209037655681]
Semi-supervised Video Object aims to segment a specified target throughout a video sequence, by a first-frame mask.<n>SeC framework established a deep semantic understanding of the object for more persistent segmentation.<n>SeC achieved 39.7 JFn on the test set and ranked 2nd place in the Complex VOS track of the 7th Large-scale Video Object Challenge.
arXiv Detail & Related papers (2025-09-28T12:26:03Z) - SAMSON: 3rd Place Solution of LSVOS 2025 VOS Challenge [9.131199997701282]
Large-scale Video Object module (LSVOS) addresses the challenge of accurately tracking and segmenting objects in long video sequences.<n>Our method achieved a final performance of 0.8427 in terms of J &F in the test-set leaderboard.
arXiv Detail & Related papers (2025-09-22T08:30:34Z) - Pseudo-Label Enhanced Cascaded Framework: 2nd Technical Report for LSVOS 2025 VOS Track [11.068687286561177]
Complex Video Object (VOS) presents significant challenges in accurately segmenting objects across frames.<n>We present our solution for the LSVOS 2025 VOS Track based on the SAM2 framework.<n>We achieve a J&F score of 0.8616 on the MOSE test set -- +1.4 points over our SAM2Long baseline -- securing the 2nd place in the LSVOS 2025 VOS Track.
arXiv Detail & Related papers (2025-09-18T12:23:51Z) - MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes [131.45528437023643]
Video object segmentation (VOS) aims to segment specified target objects throughout a video.<n>To bridge this gap, the coMplex video Object SEgmentation dataset was introduced to facilitate VOS research in complex scenes.<n>We present MOSEv2, a significantly more challenging dataset designed to further advance VOS methods under real-world conditions.
arXiv Detail & Related papers (2025-08-07T17:59:27Z) - THU-Warwick Submission for EPIC-KITCHEN Challenge 2025: Semi-Supervised Video Object Segmentation [49.54727231738117]
Our method combines large-scale visual pretraining from SAM2 with depth-based geometric cues to handle complex scenes and long-term tracking.<n>On the VISOR test set, our method reaches a J&F score of 90.1%.
arXiv Detail & Related papers (2025-06-07T10:33:16Z) - Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation [51.2732688481343]
We introduce Longtextbf-RVOS, a large-scale benchmark for long-term referring object segmentation.<n>Long-RVOS contains 2,000+ videos of an average duration exceeding 60 seconds, covering a variety of objects.<n>Unlike previous benchmarks that rely solely on the per-frame spatial evaluation, we introduce two metrics to assess the temporal andtemporal consistency.
arXiv Detail & Related papers (2025-05-19T04:52:31Z) - The 2nd Solution for LSVOS Challenge RVOS Track: Spatial-temporal Refinement for Consistent Semantic Segmentation [0.0]
We propose a method to enhance the temporal consistency of the referring object segmentation model.
Our method placed 2nd in the final ranking of the RVOS Track at the ECCV 2024 LSVOS Challenge.
arXiv Detail & Related papers (2024-08-22T14:43:02Z) - LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS [25.894649323139987]
We combine the strengths of the state-of-the-art (SOTA) models SAM2 and Cutie to address these challenges.
Our approach achieves a J&F score of 0.7952 in the testing phase of LSVOS challenge VOS track, ranking third overall.
arXiv Detail & Related papers (2024-08-20T00:45:13Z) - 3rd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation [63.199793919573295]
Video Object (VOS) is a vital task in computer vision, focusing on distinguishing foreground objects from the background across video frames.
Our work draws inspiration from the Cutie model, and we investigate the effects of object memory, the total number of memory frames, and input resolution on segmentation performance.
arXiv Detail & Related papers (2024-06-06T00:56:25Z) - Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS)
Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage.
We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.