Frame-To-Frame Consistent Semantic Segmentation
- URL: http://arxiv.org/abs/2008.00948v3
- Date: Thu, 27 Aug 2020 18:14:38 GMT
- Title: Frame-To-Frame Consistent Semantic Segmentation
- Authors: Manuel Rebol, Patrick Kn\"obelreiter
- Abstract summary: We train a convolutional neural network (CNN) which propagates features through consecutive frames in a video.
Our results indicate that the added temporal information produces a frame-to-frame consistent and more accurate image understanding.
- Score: 2.538209532048867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we aim for temporally consistent semantic segmentation
throughout frames in a video. Many semantic segmentation algorithms process
images individually which leads to an inconsistent scene interpretation due to
illumination changes, occlusions and other variations over time. To achieve a
temporally consistent prediction, we train a convolutional neural network (CNN)
which propagates features through consecutive frames in a video using a
convolutional long short term memory (ConvLSTM) cell. Besides the temporal
feature propagation, we penalize inconsistencies in our loss function. We show
in our experiments that the performance improves when utilizing video
information compared to single frame prediction. The mean intersection over
union (mIoU) metric on the Cityscapes validation set increases from 45.2 % for
the single frames to 57.9 % for video data after implementing the ConvLSTM to
propagate features trough time on the ESPNet. Most importantly, inconsistency
decreases from 4.5 % to 1.3 % which is a reduction by 71.1 %. Our results
indicate that the added temporal information produces a frame-to-frame
consistent and more accurate image understanding compared to single frame
processing. Code and videos are available at
https://github.com/mrebol/f2f-consistent-semantic-segmentation
Related papers
- Space-time Reinforcement Network for Video Object Segmentation [16.67780344875854]
Video object segmentation (VOS) networks typically use memory-based methods.
These methods suffer from two issues: 1) Challenging data can destroy the space-time coherence between adjacent video frames, and 2) Pixel-level matching will lead to undesired mismatching.
In this paper, we propose to generate an auxiliary frame between adjacent frames, serving as an implicit short-temporal reference for the query one.
arXiv Detail & Related papers (2024-05-07T06:26:30Z) - Distortion-Aware Network Pruning and Feature Reuse for Real-time Video
Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks.
Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins.
We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z) - Efficient Video Segmentation Models with Per-frame Inference [117.97423110566963]
We focus on improving the temporal consistency without introducing overhead in inference.
We propose several techniques to learn from the video sequence, including a temporal consistency loss and online/offline knowledge distillation methods.
arXiv Detail & Related papers (2022-02-24T23:51:36Z) - Object Propagation via Inter-Frame Attentions for Temporally Stable
Video Instance Segmentation [51.68840525174265]
Video instance segmentation aims to detect, segment, and track objects in a video.
Current approaches extend image-level segmentation algorithms to the temporal domain.
We propose a video instance segmentation method that alleviates the problem due to missing detections.
arXiv Detail & Related papers (2021-11-15T04:15:57Z) - Video Instance Segmentation using Inter-Frame Communication Transformers [28.539742250704695]
Recently, the per-clip pipeline shows superior performance over per-frame methods.
Previous per-clip models require heavy computation and memory usage to achieve frame-to-frame communications.
We propose Inter-frame Communication Transformers (IFC), which significantly reduces the overhead for information-passing between frames.
arXiv Detail & Related papers (2021-06-07T02:08:39Z) - Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic
Parsing [55.97957664897004]
An effective recipe for building seq2seq, non-autoregressive, task-orienteds to map utterances to semantic frames proceeds in three steps.
These models are typically bottlenecked by length prediction.
In our work, we propose non-autoregressives which shift the decoding task from text generation to span prediction.
arXiv Detail & Related papers (2021-04-15T07:02:35Z) - No frame left behind: Full Video Action Recognition [26.37329995193377]
We propose full video action recognition and consider all video frames.
We first cluster all frame activations along the temporal dimension.
We then temporally aggregate the frames in the clusters into a smaller number of representations.
arXiv Detail & Related papers (2021-03-29T07:44:28Z) - Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised
Video Object Segmentation [27.559093073097483]
Current approaches for Semi-supervised Video Object (Semi-VOS) propagates information from previous frames to generate segmentation mask for the current frame.
We exploit this observation by using temporal information to quickly identify frames with minimal change.
We propose a novel dynamic network that estimates change across frames and decides which path -- computing a full network or reusing previous frame's feature -- to choose.
arXiv Detail & Related papers (2020-12-21T19:40:17Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z) - Efficient Video Semantic Segmentation with Labels Propagation and
Refinement [138.55845680523908]
This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach.
We propose an Efficient Video(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next.
On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.
arXiv Detail & Related papers (2019-12-26T11:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.