Noisy-LSTM: Improving Temporal Awareness for Video Semantic Segmentation
- URL: http://arxiv.org/abs/2010.09466v1
- Date: Mon, 19 Oct 2020 13:08:15 GMT
- Title: Noisy-LSTM: Improving Temporal Awareness for Video Semantic Segmentation
- Authors: Bowen Wang, Liangzhi Li, Yuta Nakashima, Ryo Kawasaki, Hajime
Nagahara, Yasushi Yagi
- Abstract summary: This paper presents a new model named Noisy-LSTM, which is trainable in an end-to-end manner.
We also present a simple yet effective training strategy, which replaces a frame in video sequence with noises.
- Score: 29.00635219317848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic video segmentation is a key challenge for various applications. This
paper presents a new model named Noisy-LSTM, which is trainable in an
end-to-end manner, with convolutional LSTMs (ConvLSTMs) to leverage the
temporal coherency in video frames. We also present a simple yet effective
training strategy, which replaces a frame in video sequence with noises. This
strategy spoils the temporal coherency in video frames during training and thus
makes the temporal links in ConvLSTMs unreliable, which may consequently
improve feature extraction from video frames, as well as serve as a regularizer
to avoid overfitting, without requiring extra data annotation or computational
costs. Experimental results demonstrate that the proposed model can achieve
state-of-the-art performances in both the CityScapes and EndoVis2018 datasets.
Related papers
- Event-guided Low-light Video Semantic Segmentation [6.938849566816958]
Event cameras can capture motion dynamics, filter out temporal-redundant information, and are robust to lighting conditions.
We propose EVSNet, a lightweight framework that leverages event modality to guide the learning of a unified illumination-invariant representation.
Specifically, we leverage a Motion Extraction Module to extract short-term and long-term temporal motions from event modality and a Motion Fusion Module to integrate image features and motion features adaptively.
arXiv Detail & Related papers (2024-11-01T14:54:34Z) - Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [52.63845811751936]
Video pre-training is challenging due to the modeling of its dynamics video.
In this paper, we address such limitations in video pre-training with an efficient video decomposition.
Our framework is both capable of comprehending and generating image and video content, as demonstrated by its performance across 13 multimodal benchmarks.
arXiv Detail & Related papers (2024-02-05T16:30:49Z) - Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM
Animator [59.589919015669274]
This study focuses on zero-shot text-to-video generation considering the data- and cost-efficient.
We propose a novel Free-Bloom pipeline that harnesses large language models (LLMs) as the director to generate a semantic-coherence prompt sequence.
We also propose a series of annotative modifications to adapting LDMs in the reverse process, including joint noise sampling, step-aware attention shift, and dual-path.
arXiv Detail & Related papers (2023-09-25T19:42:16Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - Weakly-supervised Representation Learning for Video Alignment and
Analysis [16.80278496414627]
This paper introduces LRProp -- a novel weakly-supervised representation learning approach.
The proposed algorithm uses also a regularized SoftDTW loss for better tuning the learned features.
Our novel representation learning paradigm consistently outperforms the state of the art on temporal alignment tasks.
arXiv Detail & Related papers (2023-02-08T14:01:01Z) - Exploiting long-term temporal dynamics for video captioning [40.15826846670479]
We propose a novel approach, namely temporal and spatial LSTM (TS-LSTM), which systematically exploits spatial and temporal dynamics within video sequences.
Experimental results obtained in two public video captioning benchmarks indicate that our TS-LSTM outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2022-02-22T11:40:09Z) - Temporal Modulation Network for Controllable Space-Time Video
Super-Resolution [66.06549492893947]
Space-time video super-resolution aims to increase the spatial and temporal resolutions of low-resolution and low-frame-rate videos.
Deformable convolution based methods have achieved promising STVSR performance, but they could only infer the intermediate frame pre-defined in the training stage.
We propose a Temporal Modulation Network (TMNet) to interpolate arbitrary intermediate frame(s) with accurate high-resolution reconstruction.
arXiv Detail & Related papers (2021-04-21T17:10:53Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.