3rd Place Solution for PVUW2023 VSS Track: A Large Model for Semantic
Segmentation on VSPW
- URL: http://arxiv.org/abs/2306.02291v2
- Date: Tue, 6 Jun 2023 01:49:09 GMT
- Title: 3rd Place Solution for PVUW2023 VSS Track: A Large Model for Semantic
Segmentation on VSPW
- Authors: Shijie Chang, Zeqi Hao, Ben Kang, Xiaoqi Zhao, Jiawen Zhu, Zhenyu
Chen, Lihe Zhang, Lu Zhang, Huchuan Lu
- Abstract summary: In this paper, we introduce 3rd place solution for PVUW2023 VSS track.
We have explored various image-level visual backbones and segmentation heads to tackle the problem of video semantic segmentation.
- Score: 68.56017675820897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce 3rd place solution for PVUW2023 VSS track.
Semantic segmentation is a fundamental task in computer vision with numerous
real-world applications. We have explored various image-level visual backbones
and segmentation heads to tackle the problem of video semantic segmentation.
Through our experimentation, we find that InternImage-H as the backbone and
Mask2former as the segmentation head achieves the best performance. In
addition, we explore two post-precessing methods: CascadePSP and Segment
Anything Model (SAM). Ultimately, our approach obtains 62.60\% and 64.84\% mIoU
on the VSPW test set1 and final test set, respectively, securing the third
position in the PVUW2023 VSS track.
Related papers
- Solution for CVPR 2024 UG2+ Challenge Track on All Weather Semantic Segmentation [9.322345758563886]
We present our solution for the semantic segmentation in adverse weather, in UG2+ Challenge at CVPR 2024.
We initialize the InternImage-H backbone with pre-trained weights from the large-scale joint dataset and enhance it with the state-of-the-art Upernet segmentation method.
Our proposed solution demonstrates advanced performance on the test set and achieves 3rd position in this challenge.
arXiv Detail & Related papers (2024-06-09T15:56:35Z) - 3rd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation [63.199793919573295]
Video Object (VOS) is a vital task in computer vision, focusing on distinguishing foreground objects from the background across video frames.
Our work draws inspiration from the Cutie model, and we investigate the effects of object memory, the total number of memory frames, and input resolution on segmentation performance.
arXiv Detail & Related papers (2024-06-06T00:56:25Z) - 2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation [12.274092278786966]
Video Panoptic (VPS) aims to simultaneously classify, track, segment all objects in a video.
We propose a robust integrated video panoptic segmentation solution.
Our method achieves state-of-the-art performance with a VPQ score of 56.36 and 57.12 in the development and test phases.
arXiv Detail & Related papers (2024-06-01T17:03:16Z) - A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to
1st VCL Challenge -- Multi-Task Robustness Track [31.754017006309564]
We propose a framework named UniNet that seamlessly combines various visual perception algorithms into a multi-task model.
Specifically, we choose DETR3D, Mask2Former, and BinsFormer for 3D object detection, instance segmentation, and depth estimation tasks.
The final submission is a single model with InternImage-L backbone, and achieves a 49.6 overall score.
arXiv Detail & Related papers (2024-02-27T08:51:20Z) - Joint Depth Prediction and Semantic Segmentation with Multi-View SAM [59.99496827912684]
We propose a Multi-View Stereo (MVS) technique for depth prediction that benefits from rich semantic features of the Segment Anything Model (SAM)
This enhanced depth prediction, in turn, serves as a prompt to our Transformer-based semantic segmentation decoder.
arXiv Detail & Related papers (2023-10-31T20:15:40Z) - 3rd Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation [10.04177400017471]
We propose a robust integrated video panoptic segmentation solution.
In our solution, we represent both semantic and instance targets as a set of queries.
We then combine these queries with video features extracted by neural networks to predict segmentation masks.
arXiv Detail & Related papers (2023-06-11T19:44:40Z) - 1st Place Solution of The Robust Vision Challenge (RVC) 2022 Semantic
Segmentation Track [67.56316745239629]
This report describes the winning solution to the semantic segmentation task of the Robust Vision Challenge on ECCV 2022.
Our method adopts the FAN-B-Hybrid model as the encoder and uses Segformer as the segmentation framework.
The proposed method could serve as a strong baseline for the multi-domain segmentation task and benefit future works.
arXiv Detail & Related papers (2022-10-23T20:52:22Z) - Mask2Former for Video Instance Segmentation [172.10001340104515]
Mask2Former achieves state-of-the-art performance on video segmentation instance without modifying the architecture, the loss or even the training pipeline.
We show universal image segmentation architectures trivially generalize to video segmentation by directly predicting 3D segmentation volumes.
arXiv Detail & Related papers (2021-12-20T18:59:59Z) - Three Ways to Improve Semantic Segmentation with Self-Supervised Depth
Estimation [90.87105131054419]
We present a framework for semi-supervised semantic segmentation, which is enhanced by self-supervised monocular depth estimation from unlabeled image sequences.
We validate the proposed model on the Cityscapes dataset, where all three modules demonstrate significant performance gains.
arXiv Detail & Related papers (2020-12-19T21:18:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.