Noisy-LSTM: Improving Temporal Awareness for Video Semantic Segmentation
- URL: http://arxiv.org/abs/2010.09466v1
- Date: Mon, 19 Oct 2020 13:08:15 GMT
- Title: Noisy-LSTM: Improving Temporal Awareness for Video Semantic Segmentation
- Authors: Bowen Wang, Liangzhi Li, Yuta Nakashima, Ryo Kawasaki, Hajime
Nagahara, Yasushi Yagi
- Abstract summary: This paper presents a new model named Noisy-LSTM, which is trainable in an end-to-end manner.
We also present a simple yet effective training strategy, which replaces a frame in video sequence with noises.
- Score: 29.00635219317848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic video segmentation is a key challenge for various applications. This
paper presents a new model named Noisy-LSTM, which is trainable in an
end-to-end manner, with convolutional LSTMs (ConvLSTMs) to leverage the
temporal coherency in video frames. We also present a simple yet effective
training strategy, which replaces a frame in video sequence with noises. This
strategy spoils the temporal coherency in video frames during training and thus
makes the temporal links in ConvLSTMs unreliable, which may consequently
improve feature extraction from video frames, as well as serve as a regularizer
to avoid overfitting, without requiring extra data annotation or computational
costs. Experimental results demonstrate that the proposed model can achieve
state-of-the-art performances in both the CityScapes and EndoVis2018 datasets.
Related papers
- LLVD: LSTM-based Explicit Motion Modeling in Latent Space for Blind Video Denoising [1.9253333342733672]
This paper introduces a novel algorithm designed for scenarios where noise is introduced during video capture.
We propose the Latent space LSTM Video Denoiser (LLVD), an end-to-end blind denoising model.
Experiments reveal that LLVD demonstrates excellent performance for both synthetic and captured noise.
arXiv Detail & Related papers (2025-01-10T06:20:27Z) - SyncVIS: Synchronized Video Instance Segmentation [48.75470418596875]
We propose to conduct synchronized modeling via a new framework named SyncVIS.
SyncVIS explicitly introduces video-level query embeddings and designs two key modules to synchronize video-level query with frame-level query embeddings.
The proposed approach achieves state-of-the-art results, which demonstrates the effectiveness and generality of the proposed approach.
arXiv Detail & Related papers (2024-12-01T16:43:20Z) - Event-guided Low-light Video Semantic Segmentation [6.938849566816958]
Event cameras can capture motion dynamics, filter out temporal-redundant information, and are robust to lighting conditions.
We propose EVSNet, a lightweight framework that leverages event modality to guide the learning of a unified illumination-invariant representation.
Specifically, we leverage a Motion Extraction Module to extract short-term and long-term temporal motions from event modality and a Motion Fusion Module to integrate image features and motion features adaptively.
arXiv Detail & Related papers (2024-11-01T14:54:34Z) - Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [52.63845811751936]
Video pre-training is challenging due to the modeling of its dynamics video.
In this paper, we address such limitations in video pre-training with an efficient video decomposition.
Our framework is both capable of comprehending and generating image and video content, as demonstrated by its performance across 13 multimodal benchmarks.
arXiv Detail & Related papers (2024-02-05T16:30:49Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - Exploiting long-term temporal dynamics for video captioning [40.15826846670479]
We propose a novel approach, namely temporal and spatial LSTM (TS-LSTM), which systematically exploits spatial and temporal dynamics within video sequences.
Experimental results obtained in two public video captioning benchmarks indicate that our TS-LSTM outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2022-02-22T11:40:09Z) - Temporal Modulation Network for Controllable Space-Time Video
Super-Resolution [66.06549492893947]
Space-time video super-resolution aims to increase the spatial and temporal resolutions of low-resolution and low-frame-rate videos.
Deformable convolution based methods have achieved promising STVSR performance, but they could only infer the intermediate frame pre-defined in the training stage.
We propose a Temporal Modulation Network (TMNet) to interpolate arbitrary intermediate frame(s) with accurate high-resolution reconstruction.
arXiv Detail & Related papers (2021-04-21T17:10:53Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.