Efficient Video Segmentation Models with Per-frame Inference
- URL: http://arxiv.org/abs/2202.12427v1
- Date: Thu, 24 Feb 2022 23:51:36 GMT
- Title: Efficient Video Segmentation Models with Per-frame Inference
- Authors: Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang
- Abstract summary: We focus on improving the temporal consistency without introducing overhead in inference.
We propose several techniques to learn from the video sequence, including a temporal consistency loss and online/offline knowledge distillation methods.
- Score: 117.97423110566963
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Most existing real-time deep models trained with each frame independently may
produce inconsistent results across the temporal axis when tested on a video
sequence. A few methods take the correlations in the video sequence into
account,e.g., by propagating the results to the neighboring frames using
optical flow or extracting frame representations using multi-frame information,
which may lead to inaccurate results or unbalanced latency. In this work, we
focus on improving the temporal consistency without introducing computation
overhead in inference. To this end, we perform inference at each frame.
Temporal consistency is achieved by learning from video frames with extra
constraints during the training phase. introduced for inference. We propose
several techniques to learn from the video sequence, including a temporal
consistency loss and online/offline knowledge distillation methods. On the task
of semantic video segmentation, weighing among accuracy, temporal smoothness,
and efficiency, our proposed method outperforms keyframe-based methods and a
few baseline methods that are trained with each frame independently, on
datasets including Cityscapes, Camvid, and 300VW-Mask. We further apply our
training method to video instance segmentation on YouTubeVISand develop an
application of portrait matting in video sequences, by segmenting temporally
consistent instance-level trimaps across frames. Experiments show superior
qualitative and quantitative results. Code is available at:
https://git.io/vidseg.
Related papers
- FIFO-Diffusion: Generating Infinite Videos from Text without Training [44.65468310143439]
FIFO-Diffusion is conceptually capable of generating infinitely long videos without additional training.
Our method dequeues a fully denoised frame at the head while enqueuing a new random noise frame at the tail.
We have demonstrated the promising results and effectiveness of the proposed methods on existing text-to-video generation baselines.
arXiv Detail & Related papers (2024-05-19T07:48:41Z) - Video alignment using unsupervised learning of local and global features [0.0]
We introduce an unsupervised method for alignment that uses global and local features of the frames.
In particular, we introduce effective features for each video frame by means of three machine vision tools: person detection, pose estimation, and VGG network.
The main advantage of our approach is that no training is required, which makes it applicable for any new type of action without any need to collect training samples for it.
arXiv Detail & Related papers (2023-04-13T22:20:54Z) - A Perceptual Quality Metric for Video Frame Interpolation [6.743340926667941]
As video frame results often unique artifacts, existing quality metrics sometimes are not consistent with human perception when measuring the results.
Some recent deep learning-based quality metrics are shown more consistent with human judgments, but their performance on videos is compromised since they do not consider temporal information.
Our method learns perceptual features directly from videos instead of individual frames.
arXiv Detail & Related papers (2022-10-04T19:56:10Z) - Revealing Single Frame Bias for Video-and-Language Learning [115.01000652123882]
We show that a single-frame trained model can achieve better performance than existing methods that use multiple frames for training.
This result reveals the existence of a strong "static appearance bias" in popular video-and-language datasets.
We propose two new retrieval tasks based on existing fine-grained action recognition datasets that encourage temporal modeling.
arXiv Detail & Related papers (2022-06-07T16:28:30Z) - Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly instead of a large dataset.
We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z) - Video Frame Interpolation without Temporal Priors [91.04877640089053]
Video frame aims to synthesize non-exist intermediate frames in a video sequence.
The temporal priors of videos, i.e. frames per second (FPS) and frame exposure time, may vary from different camera sensors.
We devise a novel optical flow refinement strategy for better synthesizing results.
arXiv Detail & Related papers (2021-12-02T12:13:56Z) - Blind Video Temporal Consistency via Deep Video Prior [61.062900556483164]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly.
We show that temporal consistency can be achieved by training a convolutional network on a video with the Deep Video Prior.
arXiv Detail & Related papers (2020-10-22T16:19:20Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.