Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing
- URL: http://arxiv.org/abs/2109.02281v1
- Date: Mon, 6 Sep 2021 08:24:38 GMT
- Title: Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing
- Authors: Xingjian He, Weining Wang, Zhiyong Xu, Hao Wang, Jie Jiang, Jing Liu
- Abstract summary: We propose a Spatial-Temporal Semantic Consistency method to capture class-exclusive context information.
Specifically, we design a spatial-temporal consistency loss to constrain the semantic consistency in spatial and temporal dimensions.
Our method wins the 1st place on VSPW challenge at ICCV 2021.
- Score: 11.848929625911575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compared with image scene parsing, video scene parsing introduces temporal
information, which can effectively improve the consistency and accuracy of
prediction. In this paper, we propose a Spatial-Temporal Semantic Consistency
method to capture class-exclusive context information. Specifically, we design
a spatial-temporal consistency loss to constrain the semantic consistency in
spatial and temporal dimensions. In addition, we adopt an pseudo-labeling
strategy to enrich the training dataset. We obtain the scores of 59.84% and
58.85% mIoU on development (test part 1) and testing set of VSPW, respectively.
And our method wins the 1st place on VSPW challenge at ICCV2021.
Related papers
- Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - A Two-Stage Adverse Weather Semantic Segmentation Method for WeatherProof Challenge CVPR 2024 Workshop UG2+ [10.069192320623031]
We propose a two-stage deep learning framework for the WeatherProof dataset challenge.
In the challenge, our solution achieved a competitive score of 0.43 on the Mean Intersection over Union (mIoU) metric, securing a respectable rank of 4th.
arXiv Detail & Related papers (2024-06-08T16:22:26Z) - Spatial Decomposition and Temporal Fusion based Inter Prediction for
Learned Video Compression [59.632286735304156]
We propose a spatial decomposition and temporal fusion based inter prediction for learned video compression.
With the SDD-based motion model and long short-term temporal fusion, our proposed learned video can obtain more accurate inter prediction contexts.
arXiv Detail & Related papers (2024-01-29T03:30:21Z) - Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph
Generation [64.85974098314344]
Video scene graph generation (VidSGG) aims to identify objects in visual scenes and infer their relationships for a given video.
Inherently, object pairs and their relationships enjoy spatial co-occurrence correlations within each image and temporal consistency/transition correlations across different images.
We propose a spatial-temporal knowledge-embedded transformer (STKET) that incorporates the prior spatial-temporal knowledge into the multi-head cross-attention mechanism.
arXiv Detail & Related papers (2023-09-23T02:40:28Z) - Semantic Segmentation on VSPW Dataset through Contrastive Loss and
Multi-dataset Training Approach [7.112725255953468]
This paper presents the winning solution of the CVPR2023 workshop for video semantic segmentation.
Our approach achieves 65.95% mIoU performance on the VSPW dataset, ranked 1st place on the challenge at CVPR 2023.
arXiv Detail & Related papers (2023-06-06T08:53:53Z) - Video Shadow Detection via Spatio-Temporal Interpolation Consistency
Training [31.115226660100294]
We propose a framework to feed the unlabeled video frames together with the labeled images into an image shadow detection network training.
We then derive the spatial and temporal consistency constraints accordingly for enhancing generalization in the pixel-wise classification.
In addition, we design a Scale-Aware Network for multi-scale shadow knowledge learning in images.
arXiv Detail & Related papers (2022-06-17T14:29:51Z) - End-to-End Semi-Supervised Learning for Video Action Detection [23.042410033982193]
We propose a simple end-to-end based approach effectively which utilizes the unlabeled data.
Video action detection requires both, action class prediction as well as a-temporal consistency.
We demonstrate the effectiveness of the proposed approach on two different action detection benchmark datasets.
arXiv Detail & Related papers (2022-03-08T18:11:25Z) - Contextualized Spatio-Temporal Contrastive Learning with
Self-Supervision [106.77639982059014]
We present ConST-CL framework to effectively learn-temporally fine-grained representations.
We first design a region-based self-supervised task which requires the model to learn to transform instance representations from one view to another guided by context features.
We then introduce a simple design that effectively reconciles the simultaneous learning of both holistic and local representations.
arXiv Detail & Related papers (2021-12-09T19:13:41Z) - Geography-Aware Self-Supervised Learning [79.4009241781968]
We show that due to their different characteristics, a non-trivial gap persists between contrastive and supervised learning on standard benchmarks.
We propose novel training methods that exploit the spatially aligned structure of remote sensing data.
Our experiments show that our proposed method closes the gap between contrastive and supervised learning on image classification, object detection and semantic segmentation for remote sensing.
arXiv Detail & Related papers (2020-11-19T17:29:13Z) - Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem
Formulation [53.850686395708905]
Event-based cameras record an asynchronous stream of per-pixel brightness changes.
In this paper, we focus on single-layer architectures for representation learning from event data.
We show improvements of up to 9 % in the recognition accuracy compared to the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-23T10:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.