Related papers: Frame-To-Frame Consistent Semantic Segmentation

Frame-To-Frame Consistent Semantic Segmentation

URL: http://arxiv.org/abs/2008.00948v3
Date: Thu, 27 Aug 2020 18:14:38 GMT
Title: Frame-To-Frame Consistent Semantic Segmentation
Authors: Manuel Rebol, Patrick Kn\"obelreiter
Abstract summary: We train a convolutional neural network (CNN) which propagates features through consecutive frames in a video. Our results indicate that the added temporal information produces a frame-to-frame consistent and more accurate image understanding.
Score: 2.538209532048867
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we aim for temporally consistent semantic segmentation throughout frames in a video. Many semantic segmentation algorithms process images individually which leads to an inconsistent scene interpretation due to illumination changes, occlusions and other variations over time. To achieve a temporally consistent prediction, we train a convolutional neural network (CNN) which propagates features through consecutive frames in a video using a convolutional long short term memory (ConvLSTM) cell. Besides the temporal feature propagation, we penalize inconsistencies in our loss function. We show in our experiments that the performance improves when utilizing video information compared to single frame prediction. The mean intersection over union (mIoU) metric on the Cityscapes validation set increases from 45.2 % for the single frames to 57.9 % for video data after implementing the ConvLSTM to propagate features trough time on the ESPNet. Most importantly, inconsistency decreases from 4.5 % to 1.3 % which is a reduction by 71.1 %. Our results indicate that the added temporal information produces a frame-to-frame consistent and more accurate image understanding compared to single frame processing. Code and videos are available at https://github.com/mrebol/f2f-consistent-semantic-segmentation

Related papers

Space-time Reinforcement Network for Video Object Segmentation [16.67780344875854]
Video object segmentation (VOS) networks typically use memory-based methods. These methods suffer from two issues: 1) Challenging data can destroy the space-time coherence between adjacent video frames, and 2) Pixel-level matching will lead to undesired mismatching. In this paper, we propose to generate an auxiliary frame between adjacent frames, serving as an implicit short-temporal reference for the query one.
arXiv Detail & Related papers (2024-05-07T06:26:30Z)
Distortion-Aware Network Pruning and Feature Reuse for Real-time Video Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks. Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins. We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z)
Efficient Video Segmentation Models with Per-frame Inference [117.97423110566963]
We focus on improving the temporal consistency without introducing overhead in inference. We propose several techniques to learn from the video sequence, including a temporal consistency loss and online/offline knowledge distillation methods.
arXiv Detail & Related papers (2022-02-24T23:51:36Z)
Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation [51.68840525174265]
Video instance segmentation aims to detect, segment, and track objects in a video. Current approaches extend image-level segmentation algorithms to the temporal domain. We propose a video instance segmentation method that alleviates the problem due to missing detections.
arXiv Detail & Related papers (2021-11-15T04:15:57Z)
Video Instance Segmentation using Inter-Frame Communication Transformers [28.539742250704695]
Recently, the per-clip pipeline shows superior performance over per-frame methods. Previous per-clip models require heavy computation and memory usage to achieve frame-to-frame communications. We propose Inter-frame Communication Transformers (IFC), which significantly reduces the overhead for information-passing between frames.
arXiv Detail & Related papers (2021-06-07T02:08:39Z)
Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing [55.97957664897004]
An effective recipe for building seq2seq, non-autoregressive, task-orienteds to map utterances to semantic frames proceeds in three steps. These models are typically bottlenecked by length prediction. In our work, we propose non-autoregressives which shift the decoding task from text generation to span prediction.
arXiv Detail & Related papers (2021-04-15T07:02:35Z)
No frame left behind: Full Video Action Recognition [26.37329995193377]
We propose full video action recognition and consider all video frames. We first cluster all frame activations along the temporal dimension. We then temporally aggregate the frames in the clusters into a smaller number of representations.
arXiv Detail & Related papers (2021-03-29T07:44:28Z)
Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation [27.559093073097483]
Current approaches for Semi-supervised Video Object (Semi-VOS) propagates information from previous frames to generate segmentation mask for the current frame. We exploit this observation by using temporal information to quickly identify frames with minimal change. We propose a novel dynamic network that estimates change across frames and decides which path -- computing a full network or reusing previous frame's feature -- to choose.
arXiv Detail & Related papers (2020-12-21T19:40:17Z)
Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process. We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
Efficient Video Semantic Segmentation with Labels Propagation and Refinement [138.55845680523908]
This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach. We propose an Efficient Video(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next. On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.
arXiv Detail & Related papers (2019-12-26T11:45:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.