Video Shadow Detection via Spatio-Temporal Interpolation Consistency
Training
- URL: http://arxiv.org/abs/2206.08801v1
- Date: Fri, 17 Jun 2022 14:29:51 GMT
- Title: Video Shadow Detection via Spatio-Temporal Interpolation Consistency
Training
- Authors: Xiao Lu, Yihong Cao, Sheng Liu, Chengjiang Long, Zipei Chen, Xuanyu
Zhou, Yimin Yang, Chunxia Xiao
- Abstract summary: We propose a framework to feed the unlabeled video frames together with the labeled images into an image shadow detection network training.
We then derive the spatial and temporal consistency constraints accordingly for enhancing generalization in the pixel-wise classification.
In addition, we design a Scale-Aware Network for multi-scale shadow knowledge learning in images.
- Score: 31.115226660100294
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: It is challenging to annotate large-scale datasets for supervised video
shadow detection methods. Using a model trained on labeled images to the video
frames directly may lead to high generalization error and temporal inconsistent
results. In this paper, we address these challenges by proposing a
Spatio-Temporal Interpolation Consistency Training (STICT) framework to
rationally feed the unlabeled video frames together with the labeled images
into an image shadow detection network training. Specifically, we propose the
Spatial and Temporal ICT, in which we define two new interpolation schemes,
\textit{i.e.}, the spatial interpolation and the temporal interpolation. We
then derive the spatial and temporal interpolation consistency constraints
accordingly for enhancing generalization in the pixel-wise classification task
and for encouraging temporal consistent predictions, respectively. In addition,
we design a Scale-Aware Network for multi-scale shadow knowledge learning in
images, and propose a scale-consistency constraint to minimize the discrepancy
among the predictions at different scales. Our proposed approach is extensively
validated on the ViSha dataset and a self-annotated dataset. Experimental
results show that, even without video labels, our approach is better than most
state of the art supervised, semi-supervised or unsupervised image/video shadow
detection methods and other methods in related tasks. Code and dataset are
available at \url{https://github.com/yihong-97/STICT}.
Related papers
- Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection [19.643936110623653]
Video Anomaly Detection (VAD) aims to identify abnormalities within a specific context and timeframe.
Recent deep learning-based VAD models have shown promising results by generating high-resolution frames.
We propose a self-supervised learning approach for VAD through an inter-patch relationship prediction task.
arXiv Detail & Related papers (2024-03-28T03:07:16Z) - Learning Real-World Image De-Weathering with Imperfect Supervision [57.748585821252824]
Existing real-world de-weathering datasets often exhibit inconsistent illumination, position, and textures between the ground-truth images and the input degraded images.
We develop a Consistent Label Constructor (CLC) to generate a pseudo-label as consistent as possible with the input degraded image.
We combine the original imperfect labels and pseudo-labels to jointly supervise the de-weathering model by the proposed Information Allocation Strategy.
arXiv Detail & Related papers (2023-10-23T14:02:57Z) - SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations [12.139451002212063]
SSVOD exploits motion dynamics of videos to utilize large-scale unlabeled frames with sparse annotations.
Our method achieves significant performance improvements over existing methods on ImageNet-VID, Epic-KITCHENS, and YouTube-VIS.
arXiv Detail & Related papers (2023-09-04T06:41:33Z) - Unsupervised CD in satellite image time series by contrastive learning
and feature tracking [15.148034487267635]
We propose a two-stage approach to unsupervised change detection in satellite image time-series using contrastive learning with feature tracking.
By deriving pseudo labels from pre-trained models and using feature tracking to propagate them among the image time-series, we improve the consistency of our pseudo labels and address the challenges of seasonal changes in long-term remote sensing image time-series.
arXiv Detail & Related papers (2023-04-22T11:19:19Z) - OTPose: Occlusion-Aware Transformer for Pose Estimation in
Sparsely-Labeled Videos [21.893572076171527]
We propose a method that leverages an attention mask for occluded joints and encodes temporal dependency between frames using transformers.
We achieve state-of-the-art pose estimation results for PoseTrack 2017 and PoseTrack 2018 datasets.
arXiv Detail & Related papers (2022-07-20T08:06:06Z) - TCGL: Temporal Contrastive Graph for Self-supervised Video
Representation Learning [79.77010271213695]
We propose a novel video self-supervised learning framework named Temporal Contrastive Graph Learning (TCGL)
Our TCGL integrates the prior knowledge about the frame and snippet orders into graph structures, i.e., the intra-/inter- snippet Temporal Contrastive Graphs (TCG)
To generate supervisory signals for unlabeled videos, we introduce an Adaptive Snippet Order Prediction (ASOP) module.
arXiv Detail & Related papers (2021-12-07T09:27:56Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Modelling Neighbor Relation in Joint Space-Time Graph for Video
Correspondence Learning [53.74240452117145]
This paper presents a self-supervised method for learning reliable visual correspondence from unlabeled videos.
We formulate the correspondence as finding paths in a joint space-time graph, where nodes are grid patches sampled from frames, and are linked by two types of edges.
Our learned representation outperforms the state-of-the-art self-supervised methods on a variety of visual tasks.
arXiv Detail & Related papers (2021-09-28T05:40:01Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Frame-rate Up-conversion Detection Based on Convolutional Neural Network
for Learning Spatiotemporal Features [7.895528973776606]
This paper proposes a frame-rate conversion detection network (FCDNet) that learns forensic features caused by FRUC in an end-to-end fashion.
FCDNet uses a stack of consecutive frames as the input and effectively learns artifacts using network blocks to learn features.
arXiv Detail & Related papers (2021-03-25T08:47:46Z) - Temporal Contrastive Graph Learning for Video Action Recognition and
Retrieval [83.56444443849679]
This work takes advantage of the temporal dependencies within videos and proposes a novel self-supervised method named Temporal Contrastive Graph Learning (TCGL)
Our TCGL roots in a hybrid graph contrastive learning strategy to jointly regard the inter-snippet and intra-snippet temporal dependencies as self-supervision signals for temporal representation learning.
Experimental results demonstrate the superiority of our TCGL over the state-of-the-art methods on large-scale action recognition and video retrieval benchmarks.
arXiv Detail & Related papers (2021-01-04T08:11:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.