Offline-to-Online Knowledge Distillation for Video Instance Segmentation
- URL: http://arxiv.org/abs/2302.07516v1
- Date: Wed, 15 Feb 2023 08:24:37 GMT
- Title: Offline-to-Online Knowledge Distillation for Video Instance Segmentation
- Authors: Hojin Kim, Seunghun Lee, and Sunghoon Im
- Abstract summary: We present offline-to-online knowledge distillation (OOKD) for video instance segmentation (VIS)
Our method transfers a wealth of video knowledge from an offline model to an online model for consistent prediction.
Our method also achieves state-of-the-art performance on YTVIS-21, YTVIS-22, and OVIS datasets, with mAP scores of 46.1%, 43.6%, and 31.1%, respectively.
- Score: 13.270872063217022
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present offline-to-online knowledge distillation (OOKD) for
video instance segmentation (VIS), which transfers a wealth of video knowledge
from an offline model to an online model for consistent prediction. Unlike
previous methods that having adopting either an online or offline model, our
single online model takes advantage of both models by distilling offline
knowledge. To transfer knowledge correctly, we propose query filtering and
association (QFA), which filters irrelevant queries to exact instances. Our KD
with QFA increases the robustness of feature matching by encoding
object-centric features from a single frame supplemented by long-range global
information. We also propose a simple data augmentation scheme for knowledge
distillation in the VIS task that fairly transfers the knowledge of all classes
into the online model. Extensive experiments show that our method significantly
improves the performance in video instance segmentation, especially for
challenging datasets including long, dynamic sequences. Our method also
achieves state-of-the-art performance on YTVIS-21, YTVIS-22, and OVIS datasets,
with mAP scores of 46.1%, 43.6%, and 31.1%, respectively.
Related papers
- TCOVIS: Temporally Consistent Online Video Instance Segmentation [98.29026693059444]
We propose a novel online method for video instance segmentation called TCOVIS.
The core of our method consists of a global instance assignment strategy and a video-temporal enhancement module.
We evaluate our method on four VIS benchmarks and achieve state-of-the-art performance on all benchmarks without bells-and-whistles.
arXiv Detail & Related papers (2023-09-21T07:59:15Z) - CTVIS: Consistent Training for Online Video Instance Segmentation [62.957370691452844]
Discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS)
Recent online VIS methods leverage CIs sourced from one reference frame only, which we argue is insufficient for learning highly discriminative embeddings.
We propose a simple yet effective training strategy, called Consistent Training for Online VIS (CTVIS), which devotes to aligning the training and inference pipelines.
arXiv Detail & Related papers (2023-07-24T08:44:25Z) - Instance As Identity: A Generic Online Paradigm for Video Instance
Segmentation [84.3695480773597]
We propose a new online VIS paradigm named Instance As Identity (IAI)
IAI models temporal information for both detection and tracking in an efficient way.
We conduct extensive experiments on three VIS benchmarks.
arXiv Detail & Related papers (2022-08-05T10:29:30Z) - Video Mask Transfiner for High-Quality Video Instance Segmentation [102.50936366583106]
Video Mask Transfiner (VMT) is capable of leveraging fine-grained high-resolution features thanks to a highly efficient video transformer structure.
Based on our VMT architecture, we design an automated annotation refinement approach by iterative training and self-correction.
We compare VMT with the most recent state-of-the-art methods on the HQ-YTVIS, as well as the Youtube-VIS, OVIS and BDD100K MOTS.
arXiv Detail & Related papers (2022-07-28T11:13:37Z) - In Defense of Online Models for Video Instance Segmentation [70.16915119724757]
We propose an online framework based on contrastive learning that is able to learn more discriminative instance embeddings for association.
Despite its simplicity, our method outperforms all online and offline methods on three benchmarks.
The proposed method won first place in the video instance segmentation track of the 4th Large-scale Video Object Challenge.
arXiv Detail & Related papers (2022-07-21T17:56:54Z) - Crossover Learning for Fast Online Video Instance Segmentation [53.5613957875507]
We present a novel crossover learning scheme that uses the instance feature in the current frame to pixel-wisely localize the same instance in other frames.
To our knowledge, CrossVIS achieves state-of-the-art performance among all online VIS methods and shows a decent trade-off between latency and accuracy.
arXiv Detail & Related papers (2021-04-13T06:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.