NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation
- URL: http://arxiv.org/abs/2308.15266v2
- Date: Mon, 18 Sep 2023 14:46:11 GMT
- Title: NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation
- Authors: Tim Meinhardt and Matt Feiszli and Yuchen Fan and Laura Leal-Taixe and
Rakesh Ranjan
- Abstract summary: Video Instance (VIS) community operated under the common belief that offline methods are generally superior to a frame by frame online processing.
We present a detailed analysis on different processing paradigms and a new end-to-end Video Instance method.
Our NOVIS represents the first near-online VIS approach which avoids any handcrafted trackings.
- Score: 22.200700685751826
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Until recently, the Video Instance Segmentation (VIS) community operated
under the common belief that offline methods are generally superior to a frame
by frame online processing. However, the recent success of online methods
questions this belief, in particular, for challenging and long video sequences.
We understand this work as a rebuttal of those recent observations and an
appeal to the community to focus on dedicated near-online VIS approaches. To
support our argument, we present a detailed analysis on different processing
paradigms and the new end-to-end trainable NOVIS (Near-Online Video Instance
Segmentation) method. Our transformer-based model directly predicts
spatio-temporal mask volumes for clips of frames and performs instance tracking
between clips via overlap embeddings. NOVIS represents the first near-online
VIS approach which avoids any handcrafted tracking heuristics. We outperform
all existing VIS methods by large margins and provide new state-of-the-art
results on both YouTube-VIS (2019/2021) and the OVIS benchmarks.
Related papers
- UVIS: Unsupervised Video Instance Segmentation [65.46196594721545]
Videocaption instance segmentation requires classifying, segmenting, and tracking every object across video frames.
We propose UVIS, a novel Unsupervised Video Instance (UVIS) framework that can perform video instance segmentation without any video annotations or dense label-based pretraining.
Our framework consists of three essential steps: frame-level pseudo-label generation, transformer-based VIS model training, and query-based tracking.
arXiv Detail & Related papers (2024-06-11T03:05:50Z) - DVIS++: Improved Decoupled Framework for Universal Video Segmentation [30.703276476607545]
We present OV-DVIS++, the first open-vocabulary universal video segmentation framework.
By integrating CLIP with DVIS++, we present OV-DVIS++, the first open-vocabulary universal video segmentation framework.
arXiv Detail & Related papers (2023-12-20T03:01:33Z) - TCOVIS: Temporally Consistent Online Video Instance Segmentation [98.29026693059444]
We propose a novel online method for video instance segmentation called TCOVIS.
The core of our method consists of a global instance assignment strategy and a video-temporal enhancement module.
We evaluate our method on four VIS benchmarks and achieve state-of-the-art performance on all benchmarks without bells-and-whistles.
arXiv Detail & Related papers (2023-09-21T07:59:15Z) - Robust Online Video Instance Segmentation with Track Queries [15.834703258232002]
We propose a fully online transformer-based video instance segmentation model that performs comparably to top offline methods on the YouTube-VIS 2019 benchmark.
We show that, when combined with a strong enough image segmentation architecture, track queries can exhibit impressive accuracy while not being constrained to short videos.
arXiv Detail & Related papers (2022-11-16T18:50:14Z) - A Generalized Framework for Video Instance Segmentation [49.41441806931224]
The handling of long videos with complex and occluded sequences has emerged as a new challenge in the video instance segmentation (VIS) community.
We propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks.
We evaluate our approach on popular VIS benchmarks, achieving state-of-the-art results on YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS)
arXiv Detail & Related papers (2022-11-16T11:17:19Z) - DeVIS: Making Deformable Transformers Work for Video Instance
Segmentation [4.3012765978447565]
Video Instance (VIS) jointly tackles multi-object detection, tracking, and segmentation in video sequences.
Transformers recently allowed to cast the entire VIS task as a single set-prediction problem.
Deformable attention provides a more efficient alternative but its application to the temporal domain or the segmentation task have not yet been explored.
arXiv Detail & Related papers (2022-07-22T14:27:45Z) - Efficient Video Instance Segmentation via Tracklet Query and Proposal [62.897552852894854]
Video Instance aims to simultaneously classify, segment, and track multiple object instances in videos.
Most clip-level methods are neither end-to-end learnable nor real-time.
This paper proposes EfficientVIS, a fully end-to-end framework with efficient training and inference.
arXiv Detail & Related papers (2022-03-03T17:00:11Z) - Crossover Learning for Fast Online Video Instance Segmentation [53.5613957875507]
We present a novel crossover learning scheme that uses the instance feature in the current frame to pixel-wisely localize the same instance in other frames.
To our knowledge, CrossVIS achieves state-of-the-art performance among all online VIS methods and shows a decent trade-off between latency and accuracy.
arXiv Detail & Related papers (2021-04-13T06:47:40Z) - End-to-End Video Instance Segmentation with Transformers [84.17794705045333]
Video instance segmentation (VIS) is the task that requires simultaneously classifying, segmenting and tracking object instances of interest in video.
Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem.
For the first time, we demonstrate a much simpler and faster video instance segmentation framework built upon Transformers, achieving competitive accuracy.
arXiv Detail & Related papers (2020-11-30T02:03:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.