Efficient Video Object Segmentation with Compressed Video
- URL: http://arxiv.org/abs/2107.12192v2
- Date: Tue, 27 Jul 2021 02:55:16 GMT
- Title: Efficient Video Object Segmentation with Compressed Video
- Authors: Kai Xu and Angela Yao
- Abstract summary: We propose an efficient framework for semi-supervised video object segmentation by exploiting the temporal redundancy of the video.
Our method performs inference on selected vectors and makes predictions for other frames via propagation based on motion and residuals from the compressed video bitstream.
With STM with top-k filtering as our base model, we achieved highly competitive results on DAVIS16 and YouTube-VOS with substantial speedups of up to 4.9X with little loss in accuracy.
- Score: 36.192735485675286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an efficient inference framework for semi-supervised video object
segmentation by exploiting the temporal redundancy of the video. Our method
performs inference on selected keyframes and makes predictions for other frames
via propagation based on motion vectors and residuals from the compressed video
bitstream. Specifically, we propose a new motion vector-based warping method
for propagating segmentation masks from keyframes to other frames in a
multi-reference manner. Additionally, we propose a residual-based refinement
module that can correct and add detail to the block-wise propagated
segmentation masks. Our approach is flexible and can be added on top of
existing video object segmentation algorithms. With STM with top-k filtering as
our base model, we achieved highly competitive results on DAVIS16 and
YouTube-VOS with substantial speedups of up to 4.9X with little loss in
accuracy.
Related papers
- Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended? [22.191260650245443]
Video segmentation aims at partitioning video sequences into meaningful segments based on objects or regions of interest within frames.
Current video segmentation models are often derived from image segmentation techniques, which struggle to cope with small-scale or class-imbalanced video datasets.
We propose a training strategy Masked Video Consistency, which enhances spatial and temporal feature aggregation.
arXiv Detail & Related papers (2024-08-20T08:08:32Z) - Object Segmentation-Assisted Inter Prediction for Versatile Video Coding [53.91821712591901]
We propose an object segmentation-assisted inter prediction method (SAIP), where objects in the reference frames are segmented by some advanced technologies.
With a proper indication, the object segmentation mask is translated from the reference frame to the current frame as the arbitrary-shaped partition of different regions.
We show that the proposed method achieves up to 1.98%, 1.14%, 0.79%, and on average 0.82%, 0.49%, 0.37% BD-rate reduction for common test sequences.
arXiv Detail & Related papers (2024-03-18T11:48:20Z) - Mask Propagation for Efficient Video Semantic Segmentation [63.09523058489429]
Video Semantic baseline degradation (VSS) involves assigning a semantic label to each pixel in a video sequence.
We propose an efficient mask propagation framework for VSS, called SSSS.
Our framework reduces up to 4x FLOPs compared to the per-frame Mask2Former with only up to 2% mIoU on the Cityscapes validation set.
arXiv Detail & Related papers (2023-10-29T09:55:28Z) - Multi-grained Temporal Prototype Learning for Few-shot Video Object
Segmentation [156.4142424784322]
Few-Shot Video Object (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images.
We propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data.
Our proposed video IPMT model significantly outperforms previous models on two benchmark datasets.
arXiv Detail & Related papers (2023-09-20T09:16:34Z) - Per-Clip Video Object Segmentation [110.08925274049409]
Recently, memory-based approaches show promising results on semisupervised video object segmentation.
We treat video object segmentation as clip-wise mask-wise propagation.
We propose a new method tailored for the per-clip inference.
arXiv Detail & Related papers (2022-08-03T09:02:29Z) - GSVNet: Guided Spatially-Varying Convolution for Fast Semantic
Segmentation on Video [10.19019476978683]
We propose a simple yet efficient propagation framework for video segmentation.
We perform lightweight flow estimation in 1/8-downscaled image space for temporal warping in segmentation outpace space.
We introduce a guided spatially-varying convolution for fusing segmentations derived from the previous and current frames, to mitigate propagation error.
arXiv Detail & Related papers (2021-03-16T03:38:59Z) - Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in
Videos [159.02703673838639]
We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos.
We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks.
The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain.
arXiv Detail & Related papers (2021-01-06T18:56:24Z) - Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised
Video Object Segmentation [27.559093073097483]
Current approaches for Semi-supervised Video Object (Semi-VOS) propagates information from previous frames to generate segmentation mask for the current frame.
We exploit this observation by using temporal information to quickly identify frames with minimal change.
We propose a novel dynamic network that estimates change across frames and decides which path -- computing a full network or reusing previous frame's feature -- to choose.
arXiv Detail & Related papers (2020-12-21T19:40:17Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.