A Generalized Framework for Video Instance Segmentation
- URL: http://arxiv.org/abs/2211.08834v2
- Date: Fri, 24 Mar 2023 15:26:13 GMT
- Title: A Generalized Framework for Video Instance Segmentation
- Authors: Miran Heo, Sukjun Hwang, Jeongseok Hyun, Hanjung Kim, Seoung Wug Oh,
Joon-Young Lee, Seon Joo Kim
- Abstract summary: The handling of long videos with complex and occluded sequences has emerged as a new challenge in the video instance segmentation (VIS) community.
We propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks.
We evaluate our approach on popular VIS benchmarks, achieving state-of-the-art results on YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS)
- Score: 49.41441806931224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The handling of long videos with complex and occluded sequences has recently
emerged as a new challenge in the video instance segmentation (VIS) community.
However, existing methods have limitations in addressing this challenge. We
argue that the biggest bottleneck in current approaches is the discrepancy
between training and inference. To effectively bridge this gap, we propose a
Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art
performance on challenging benchmarks without designing complicated
architectures or requiring extra post-processing. The key contribution of
GenVIS is the learning strategy, which includes a query-based training pipeline
for sequential learning with a novel target label assignment. Additionally, we
introduce a memory that effectively acquires information from previous states.
Thanks to the new perspective, which focuses on building relationships between
separate frames or clips, GenVIS can be flexibly executed in both online and
semi-online manner. We evaluate our approach on popular VIS benchmarks,
achieving state-of-the-art results on YouTube-VIS 2019/2021/2022 and Occluded
VIS (OVIS). Notably, we greatly outperform the state-of-the-art on the long VIS
benchmark (OVIS), improving 5.6 AP with ResNet-50 backbone. Code is available
at https://github.com/miranheo/GenVIS.
Related papers
- UVIS: Unsupervised Video Instance Segmentation [65.46196594721545]
Videocaption instance segmentation requires classifying, segmenting, and tracking every object across video frames.
We propose UVIS, a novel Unsupervised Video Instance (UVIS) framework that can perform video instance segmentation without any video annotations or dense label-based pretraining.
Our framework consists of three essential steps: frame-level pseudo-label generation, transformer-based VIS model training, and query-based tracking.
arXiv Detail & Related papers (2024-06-11T03:05:50Z) - TCOVIS: Temporally Consistent Online Video Instance Segmentation [98.29026693059444]
We propose a novel online method for video instance segmentation called TCOVIS.
The core of our method consists of a global instance assignment strategy and a video-temporal enhancement module.
We evaluate our method on four VIS benchmarks and achieve state-of-the-art performance on all benchmarks without bells-and-whistles.
arXiv Detail & Related papers (2023-09-21T07:59:15Z) - NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation [22.200700685751826]
Video Instance (VIS) community operated under the common belief that offline methods are generally superior to a frame by frame online processing.
We present a detailed analysis on different processing paradigms and a new end-to-end Video Instance method.
Our NOVIS represents the first near-online VIS approach which avoids any handcrafted trackings.
arXiv Detail & Related papers (2023-08-29T12:51:04Z) - CTVIS: Consistent Training for Online Video Instance Segmentation [62.957370691452844]
Discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS)
Recent online VIS methods leverage CIs sourced from one reference frame only, which we argue is insufficient for learning highly discriminative embeddings.
We propose a simple yet effective training strategy, called Consistent Training for Online VIS (CTVIS), which devotes to aligning the training and inference pipelines.
arXiv Detail & Related papers (2023-07-24T08:44:25Z) - DVIS: Decoupled Video Instance Segmentation Framework [15.571072365208872]
Video instance segmentation (VIS) is a critical task with diverse applications, including autonomous driving and video editing.
Existing methods often underperform on complex and long videos in real world, primarily due to two factors.
We propose a decoupling strategy for VIS by dividing it into three independent sub-tasks: segmentation, tracking, and refinement.
arXiv Detail & Related papers (2023-06-06T05:24:15Z) - MinVIS: A Minimal Video Instance Segmentation Framework without
Video-based Training [84.81566912372328]
MinVIS is a minimal video instance segmentation framework.
It achieves state-of-the-art VIS performance with neither video-based architectures nor training procedures.
arXiv Detail & Related papers (2022-08-03T17:50:42Z) - Crossover Learning for Fast Online Video Instance Segmentation [53.5613957875507]
We present a novel crossover learning scheme that uses the instance feature in the current frame to pixel-wisely localize the same instance in other frames.
To our knowledge, CrossVIS achieves state-of-the-art performance among all online VIS methods and shows a decent trade-off between latency and accuracy.
arXiv Detail & Related papers (2021-04-13T06:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.