Per-Clip Video Object Segmentation
- URL: http://arxiv.org/abs/2208.01924v1
- Date: Wed, 3 Aug 2022 09:02:29 GMT
- Title: Per-Clip Video Object Segmentation
- Authors: Kwanyong Park, Sanghyun Woo, Seoung Wug Oh, In So Kweon, Joon-Young
Lee
- Abstract summary: Recently, memory-based approaches show promising results on semisupervised video object segmentation.
We treat video object segmentation as clip-wise mask-wise propagation.
We propose a new method tailored for the per-clip inference.
- Score: 110.08925274049409
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, memory-based approaches show promising results on semi-supervised
video object segmentation. These methods predict object masks frame-by-frame
with the help of frequently updated memory of the previous mask. Different from
this per-frame inference, we investigate an alternative perspective by treating
video object segmentation as clip-wise mask propagation. In this per-clip
inference scheme, we update the memory with an interval and simultaneously
process a set of consecutive frames (i.e. clip) between the memory updates. The
scheme provides two potential benefits: accuracy gain by clip-level
optimization and efficiency gain by parallel computation of multiple frames. To
this end, we propose a new method tailored for the per-clip inference.
Specifically, we first introduce a clip-wise operation to refine the features
based on intra-clip correlation. In addition, we employ a progressive matching
mechanism for efficient information-passing within a clip. With the synergy of
two modules and a newly proposed per-clip based training, our network achieves
state-of-the-art performance on Youtube-VOS 2018/2019 val (84.6% and 84.6%) and
DAVIS 2016/2017 val (91.9% and 86.1%). Furthermore, our model shows a great
speed-accuracy trade-off with varying memory update intervals, which leads to
huge flexibility.
Related papers
- Multi-grained Temporal Prototype Learning for Few-shot Video Object
Segmentation [156.4142424784322]
Few-Shot Video Object (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images.
We propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data.
Our proposed video IPMT model significantly outperforms previous models on two benchmark datasets.
arXiv Detail & Related papers (2023-09-20T09:16:34Z) - Efficient Video Segmentation Models with Per-frame Inference [117.97423110566963]
We focus on improving the temporal consistency without introducing overhead in inference.
We propose several techniques to learn from the video sequence, including a temporal consistency loss and online/offline knowledge distillation methods.
arXiv Detail & Related papers (2022-02-24T23:51:36Z) - Efficient Video Object Segmentation with Compressed Video [36.192735485675286]
We propose an efficient framework for semi-supervised video object segmentation by exploiting the temporal redundancy of the video.
Our method performs inference on selected vectors and makes predictions for other frames via propagation based on motion and residuals from the compressed video bitstream.
With STM with top-k filtering as our base model, we achieved highly competitive results on DAVIS16 and YouTube-VOS with substantial speedups of up to 4.9X with little loss in accuracy.
arXiv Detail & Related papers (2021-07-26T12:57:04Z) - Rethinking Space-Time Networks with Improved Memory Coverage for
Efficient Video Object Segmentation [68.45737688496654]
We establish correspondences directly between frames without re-encoding the mask features for every object.
With the correspondences, every node in the current query frame is inferred by aggregating features from the past in an associative fashion.
We validated that every memory node now has a chance to contribute, and experimentally showed that such diversified voting is beneficial to both memory efficiency and inference accuracy.
arXiv Detail & Related papers (2021-06-09T16:50:57Z) - Video Instance Segmentation using Inter-Frame Communication Transformers [28.539742250704695]
Recently, the per-clip pipeline shows superior performance over per-frame methods.
Previous per-clip models require heavy computation and memory usage to achieve frame-to-frame communications.
We propose Inter-frame Communication Transformers (IFC), which significantly reduces the overhead for information-passing between frames.
arXiv Detail & Related papers (2021-06-07T02:08:39Z) - Beyond Short Clips: End-to-End Video-Level Learning with Collaborative
Memories [56.91664227337115]
We introduce a collaborative memory mechanism that encodes information across multiple sampled clips of a video at each training iteration.
This enables the learning of long-range dependencies beyond a single clip.
Our proposed framework is end-to-end trainable and significantly improves the accuracy of video classification at a negligible computational overhead.
arXiv Detail & Related papers (2021-04-02T18:59:09Z) - Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised
Video Object Segmentation [27.559093073097483]
Current approaches for Semi-supervised Video Object (Semi-VOS) propagates information from previous frames to generate segmentation mask for the current frame.
We exploit this observation by using temporal information to quickly identify frames with minimal change.
We propose a novel dynamic network that estimates change across frames and decides which path -- computing a full network or reusing previous frame's feature -- to choose.
arXiv Detail & Related papers (2020-12-21T19:40:17Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.