InsPro: Propagating Instance Query and Proposal for Online Video
Instance Segmentation
- URL: http://arxiv.org/abs/2301.01882v1
- Date: Thu, 5 Jan 2023 02:41:20 GMT
- Title: InsPro: Propagating Instance Query and Proposal for Online Video
Instance Segmentation
- Authors: Fei He, Haoyang Zhang, Naiyu Gao, Jian Jia, Yanhu Shan, Xin Zhao,
Kaiqi Huang
- Abstract summary: Video instance segmentation (VIS) aims at segmenting and tracking objects in videos.
Prior methods generate frame-level or clip-level object instances first and then associate them by either additional tracking heads or complex instance matching algorithms.
In this paper, we design a simple, fast and yet effective query-based framework for online VIS.
- Score: 41.85216306978024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video instance segmentation (VIS) aims at segmenting and tracking objects in
videos. Prior methods typically generate frame-level or clip-level object
instances first and then associate them by either additional tracking heads or
complex instance matching algorithms. This explicit instance association
approach increases system complexity and fails to fully exploit temporal cues
in videos. In this paper, we design a simple, fast and yet effective
query-based framework for online VIS. Relying on an instance query and proposal
propagation mechanism with several specially developed components, this
framework can perform accurate instance association implicitly. Specifically,
we generate frame-level object instances based on a set of instance
query-proposal pairs propagated from previous frames. This instance
query-proposal pair is learned to bind with one specific object across frames
through conscientiously developed strategies. When using such a pair to predict
an object instance on the current frame, not only the generated instance is
automatically associated with its precursors on previous frames, but the model
gets a good prior for predicting the same object. In this way, we naturally
achieve implicit instance association in parallel with segmentation and
elegantly take advantage of temporal clues in videos. To show the effectiveness
of our method InsPro, we evaluate it on two popular VIS benchmarks, i.e.,
YouTube-VIS 2019 and YouTube-VIS 2021. Without bells-and-whistles, our InsPro
with ResNet-50 backbone achieves 43.2 AP and 37.6 AP on these two benchmarks
respectively, outperforming all other online VIS methods.
Related papers
- Look Before You Match: Instance Understanding Matters in Video Object
Segmentation [114.57723592870097]
In this paper, we argue that instance matters in video object segmentation (VOS)
We present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank.
We employ well-learned object queries from IS branch to inject instance-specific information into the query key, with which the instance-auged matching is further performed.
arXiv Detail & Related papers (2022-12-13T18:59:59Z) - Online Video Instance Segmentation via Robust Context Fusion [36.376900904288966]
Video instance segmentation (VIS) aims at classifying, segmenting and tracking object instances in video sequences.
Recent transformer-based neural networks have demonstrated their powerful capability of modeling for the VIS task.
We propose a robust context fusion network to tackle VIS in an online fashion, which predicts instance segmentation frame-by-frame with a few preceding frames.
arXiv Detail & Related papers (2022-07-12T15:04:50Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - STC: Spatio-Temporal Contrastive Learning for Video Instance
Segmentation [47.28515170195206]
Video Instance (VIS) is a task that simultaneously requires classification, segmentation, and instance association in a video.
Recent VIS approaches rely on sophisticated pipelines to achieve this goal, including RoI-related operations or 3D convolutions.
We present a simple and efficient single-stage VIS framework based on the instance segmentation method ConInst.
arXiv Detail & Related papers (2022-02-08T09:34:26Z) - Hybrid Instance-aware Temporal Fusion for Online Video Instance
Segmentation [23.001856276175506]
We propose an online video instance segmentation framework with a novel instance-aware temporal fusion method.
Our model achieves the best performance among all online VIS methods.
arXiv Detail & Related papers (2021-12-03T03:37:57Z) - Video Instance Segmentation with a Propose-Reduce Paradigm [68.59137660342326]
Video instance segmentation (VIS) aims to segment and associate all instances of predefined classes for each frame in videos.
Prior methods usually obtain segmentation for a frame or clip first, and then merge the incomplete results by tracking or matching.
We propose a new paradigm -- Propose-Reduce, to generate complete sequences for input videos by a single step.
arXiv Detail & Related papers (2021-03-25T10:58:36Z) - End-to-End Video Instance Segmentation with Transformers [84.17794705045333]
Video instance segmentation (VIS) is the task that requires simultaneously classifying, segmenting and tracking object instances of interest in video.
Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem.
For the first time, we demonstrate a much simpler and faster video instance segmentation framework built upon Transformers, achieving competitive accuracy.
arXiv Detail & Related papers (2020-11-30T02:03:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.