In Defense of Online Models for Video Instance Segmentation
- URL: http://arxiv.org/abs/2207.10661v1
- Date: Thu, 21 Jul 2022 17:56:54 GMT
- Title: In Defense of Online Models for Video Instance Segmentation
- Authors: Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai
- Abstract summary: We propose an online framework based on contrastive learning that is able to learn more discriminative instance embeddings for association.
Despite its simplicity, our method outperforms all online and offline methods on three benchmarks.
The proposed method won first place in the video instance segmentation track of the 4th Large-scale Video Object Challenge.
- Score: 70.16915119724757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, video instance segmentation (VIS) has been largely advanced
by offline models, while online models gradually attracted less attention
possibly due to their inferior performance. However, online methods have their
inherent advantage in handling long video sequences and ongoing videos while
offline models fail due to the limit of computational resources. Therefore, it
would be highly desirable if online models can achieve comparable or even
better performance than offline models. By dissecting current online models and
offline models, we demonstrate that the main cause of the performance gap is
the error-prone association between frames caused by the similar appearance
among different instances in the feature space. Observing this, we propose an
online framework based on contrastive learning that is able to learn more
discriminative instance embeddings for association and fully exploit history
information for stability. Despite its simplicity, our method outperforms all
online and offline methods on three benchmarks. Specifically, we achieve 49.5
AP on YouTube-VIS 2019, a significant improvement of 13.2 AP and 2.1 AP over
the prior online and offline art, respectively. Moreover, we achieve 30.2 AP on
OVIS, a more challenging dataset with significant crowding and occlusions,
surpassing the prior art by 14.8 AP. The proposed method won first place in the
video instance segmentation track of the 4th Large-scale Video Object
Segmentation Challenge (CVPR2022). We hope the simplicity and effectiveness of
our method, as well as our insight into current methods, could shed light on
the exploration of VIS models.
Related papers
- TCOVIS: Temporally Consistent Online Video Instance Segmentation [98.29026693059444]
We propose a novel online method for video instance segmentation called TCOVIS.
The core of our method consists of a global instance assignment strategy and a video-temporal enhancement module.
We evaluate our method on four VIS benchmarks and achieve state-of-the-art performance on all benchmarks without bells-and-whistles.
arXiv Detail & Related papers (2023-09-21T07:59:15Z) - CTVIS: Consistent Training for Online Video Instance Segmentation [62.957370691452844]
Discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS)
Recent online VIS methods leverage CIs sourced from one reference frame only, which we argue is insufficient for learning highly discriminative embeddings.
We propose a simple yet effective training strategy, called Consistent Training for Online VIS (CTVIS), which devotes to aligning the training and inference pipelines.
arXiv Detail & Related papers (2023-07-24T08:44:25Z) - OnlineRefer: A Simple Online Baseline for Referring Video Object
Segmentation [75.07460026246582]
Referring video object segmentation (RVOS) aims at segmenting an object in a video following human instruction.
Current state-of-the-art methods fall into an offline pattern, in which each clip independently interacts with text embedding.
We propose a simple yet effective online model using explicit query propagation, named OnlineRefer.
arXiv Detail & Related papers (2023-07-18T15:43:35Z) - Tube-Link: A Flexible Cross Tube Framework for Universal Video
Segmentation [83.65774845267622]
Tube-Link is a versatile framework that addresses multiple core tasks of video segmentation with a unified architecture.
Our framework is a near-online approach that takes a short subclip as input and outputs the corresponding spatial-temporal tube masks.
arXiv Detail & Related papers (2023-03-22T17:52:11Z) - Offline-to-Online Knowledge Distillation for Video Instance Segmentation [13.270872063217022]
We present offline-to-online knowledge distillation (OOKD) for video instance segmentation (VIS)
Our method transfers a wealth of video knowledge from an offline model to an online model for consistent prediction.
Our method also achieves state-of-the-art performance on YTVIS-21, YTVIS-22, and OVIS datasets, with mAP scores of 46.1%, 43.6%, and 31.1%, respectively.
arXiv Detail & Related papers (2023-02-15T08:24:37Z) - InstanceFormer: An Online Video Instance Segmentation Framework [21.760243214387987]
We propose a single-stage transformer-based efficient online VIS framework named InstanceFormer.
We propose three novel components to model short-term and long-term dependency and temporal coherence.
The proposed InstanceFormer outperforms previous online benchmark methods by a large margin across multiple datasets.
arXiv Detail & Related papers (2022-08-22T18:54:18Z) - Learning Online for Unified Segmentation and Tracking Models [30.146300294418516]
TrackMLP is a novel meta-learning method optimized to learn from only partial information.
We show that our model achieves state-of-the-art performance and tangible improvement over competing models.
arXiv Detail & Related papers (2021-11-12T23:52:59Z) - Crossover Learning for Fast Online Video Instance Segmentation [53.5613957875507]
We present a novel crossover learning scheme that uses the instance feature in the current frame to pixel-wisely localize the same instance in other frames.
To our knowledge, CrossVIS achieves state-of-the-art performance among all online VIS methods and shows a decent trade-off between latency and accuracy.
arXiv Detail & Related papers (2021-04-13T06:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.