The 1st Solution for MOSEv1 Challenge on LSVOS 2025: CGFSeg
- URL: http://arxiv.org/abs/2509.25738v1
- Date: Tue, 30 Sep 2025 03:50:56 GMT
- Title: The 1st Solution for MOSEv1 Challenge on LSVOS 2025: CGFSeg
- Authors: Tingmin Li, Yixuan Li, Yang Yang,
- Abstract summary: Video Object (VOS) aims to track and segment specific objects across entire video sequences.<n>In this paper, we present our improved method, Confidence-Guided Fusion extraction (CGFSeg) for the VOS task in the MOSEv1 Challenge.<n>Our method achieves a J&F score of 86.37% on the test set, ranking 1st in the MOSEv1 Challenge at LSVOS 2025.
- Score: 19.13013862040698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video Object Segmentation (VOS) aims to track and segment specific objects across entire video sequences, yet it remains highly challenging under complex real-world scenarios. The MOSEv1 and LVOS dataset, adopted in the MOSEv1 challenge on LSVOS 2025, which is specifically designed to enhance the robustness of VOS models in complex real-world scenarios, including long-term object disappearances and reappearances, as well as the presence of small and inconspicuous objects. In this paper, we present our improved method, Confidence-Guided Fusion Segmentation (CGFSeg), for the VOS task in the MOSEv1 Challenge. During training, the feature extractor of SAM2 is frozen, while the remaining components are fine-tuned to preserve strong feature extraction ability and improve segmentation accuracy. In the inference stage, we introduce a pixel-check strategy that progressively refines predictions by exploiting complementary strengths of multiple models, thereby yielding robust final masks. As a result, our method achieves a J&F score of 86.37% on the test set, ranking 1st in the MOSEv1 Challenge at LSVOS 2025. These results highlight the effectiveness of our approach in addressing the challenges of VOS task in complex scenarios.
Related papers
- LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation [186.14566815158506]
This report presents an overview of the 7th Large-scale Video Object (LSVOS) Challenge held in conjunction with ICCV 2025.<n>The 2025 edition features a newly introduced track, Complex VOS (MOSEv2)<n>We summarize datasets and protocols, highlight top-performing solutions, and distill emerging trends.
arXiv Detail & Related papers (2025-10-13T07:02:09Z) - 2nd Place Report of MOSEv2 Challenge 2025: Concept Guided Video Object Segmentation via SeC [46.76209037655681]
Semi-supervised Video Object aims to segment a specified target throughout a video sequence, by a first-frame mask.<n>SeC framework established a deep semantic understanding of the object for more persistent segmentation.<n>SeC achieved 39.7 JFn on the test set and ranked 2nd place in the Complex VOS track of the 7th Large-scale Video Object Challenge.
arXiv Detail & Related papers (2025-09-28T12:26:03Z) - The 1st Solution for MOSEv2 Challenge 2025: Long-term and Concept-aware Video Segmentation via SeC [59.53390730730018]
Solution achieves a JF score of 39.89% on the test set, ranking 1st in the MOSEv2 track of the LSVOS Challenge.
arXiv Detail & Related papers (2025-09-23T15:58:13Z) - SAMSON: 3rd Place Solution of LSVOS 2025 VOS Challenge [9.131199997701282]
Large-scale Video Object module (LSVOS) addresses the challenge of accurately tracking and segmenting objects in long video sequences.<n>Our method achieved a final performance of 0.8427 in terms of J &F in the test-set leaderboard.
arXiv Detail & Related papers (2025-09-22T08:30:34Z) - MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes [131.45528437023643]
Video object segmentation (VOS) aims to segment specified target objects throughout a video.<n>To bridge this gap, the coMplex video Object SEgmentation dataset was introduced to facilitate VOS research in complex scenes.<n>We present MOSEv2, a significantly more challenging dataset designed to further advance VOS methods under real-world conditions.
arXiv Detail & Related papers (2025-08-07T17:59:27Z) - Discriminative Spatial-Semantic VOS Solution: 1st Place Solution for 6th LSVOS [68.47681139026666]
Video object segmentation (VOS) is a crucial task in computer vision.
Current VOS methods struggle with complex scenes and prolonged object motions.
This report introduces a discriminative spatial-temporal VOS model.
arXiv Detail & Related papers (2024-08-29T10:47:17Z) - LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS [25.894649323139987]
We combine the strengths of the state-of-the-art (SOTA) models SAM2 and Cutie to address these challenges.
Our approach achieves a J&F score of 0.7952 in the testing phase of LSVOS challenge VOS track, ranking third overall.
arXiv Detail & Related papers (2024-08-20T00:45:13Z) - 3rd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation [63.199793919573295]
Video Object (VOS) is a vital task in computer vision, focusing on distinguishing foreground objects from the background across video frames.
Our work draws inspiration from the Cutie model, and we investigate the effects of object memory, the total number of memory frames, and input resolution on segmentation performance.
arXiv Detail & Related papers (2024-06-06T00:56:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.