Triple-cooperative Video Shadow Detection
- URL: http://arxiv.org/abs/2103.06533v1
- Date: Thu, 11 Mar 2021 08:54:19 GMT
- Title: Triple-cooperative Video Shadow Detection
- Authors: Zhihao Chen, Liang Wan, Lei Zhu, Jia Shen, Huazhu Fu, Wennan Liu, Jing
Qin
- Abstract summary: We collect a new video shadow detection dataset, which contains 120 videos with 11, 685 frames, covering 60 object categories, varying lengths, and different motion/lighting conditions.
We also develop a new baseline model, named triple-cooperative video shadow detection network (TVSD-Net)
Within the network, a dual gated co-attention module is proposed to constrain features from neighboring frames in the same video, while an auxiliary similarity loss is introduced to mine semantic information between different videos.
- Score: 43.030759888063194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Shadow detection in a single image has received significant research interest
in recent years. However, much fewer works have been explored in shadow
detection over dynamic scenes. The bottleneck is the lack of a well-established
dataset with high-quality annotations for video shadow detection. In this work,
we collect a new video shadow detection dataset, which contains 120 videos with
11, 685 frames, covering 60 object categories, varying lengths, and different
motion/lighting conditions. All the frames are annotated with a high-quality
pixel-level shadow mask. To the best of our knowledge, this is the first
learning-oriented dataset for video shadow detection. Furthermore, we develop a
new baseline model, named triple-cooperative video shadow detection network
(TVSD-Net). It utilizes triple parallel networks in a cooperative manner to
learn discriminative representations at intra-video and inter-video levels.
Within the network, a dual gated co-attention module is proposed to constrain
features from neighboring frames in the same video, while an auxiliary
similarity loss is introduced to mine semantic information between different
videos. Finally, we conduct a comprehensive study on ViSha, evaluating 12
state-of-the-art models (including single image shadow detectors, video object
segmentation, and saliency detection methods). Experiments demonstrate that our
model outperforms SOTA competitors.
Related papers
- Semi-supervised 3D Video Information Retrieval with Deep Neural Network
and Bi-directional Dynamic-time Warping Algorithm [14.39527406033429]
The proposed algorithm is designed to handle large video datasets and retrieve the most related videos to a given inquiry video clip.
We split both the candidate and the inquiry videos into a sequence of clips and convert each clip to a representation vector using an autoencoder-backed deep neural network.
We then calculate a similarity measure between the sequences of embedding vectors using a bi-directional dynamic time-warping method.
arXiv Detail & Related papers (2023-09-03T03:10:18Z) - Detect Any Shadow: Segment Anything for Video Shadow Detection [105.19693622157462]
We propose ShadowSAM, a framework for fine-tuning segment anything model (SAM) to detect shadows.
By combining it with long short-term attention mechanism, we extend its capability for efficient video shadow detection.
Our method exhibits accelerated inference speed compared to previous video shadow detection approaches.
arXiv Detail & Related papers (2023-05-26T07:39:10Z) - Video Instance Shadow Detection Under the Sun and Sky [81.95848151121739]
ViShadow is a semi-supervised video instance shadow detection framework.
It identifies shadow and object instances through contrastive learning for cross-frame pairing.
A retrieval mechanism is introduced to manage temporary disappearances.
arXiv Detail & Related papers (2022-11-23T10:20:19Z) - SCOTCH and SODA: A Transformer Video Shadow Detection Framework [12.42397422225366]
Shadows in videos are difficult to detect because of the large shadow deformation between frames.
We introduce the shadow deformation attention trajectory (SODA), a new type of video self-attention module.
We also present a new shadow contrastive learning mechanism (SCOTCH) which aims at guiding the network to learn a unified shadow representation.
arXiv Detail & Related papers (2022-11-13T12:23:07Z) - Video Shadow Detection via Spatio-Temporal Interpolation Consistency
Training [31.115226660100294]
We propose a framework to feed the unlabeled video frames together with the labeled images into an image shadow detection network training.
We then derive the spatial and temporal consistency constraints accordingly for enhancing generalization in the pixel-wise classification.
In addition, we design a Scale-Aware Network for multi-scale shadow knowledge learning in images.
arXiv Detail & Related papers (2022-06-17T14:29:51Z) - Guess What Moves: Unsupervised Video and Image Segmentation by
Anticipating Motion [92.80981308407098]
We propose an approach that combines the strengths of motion-based and appearance-based segmentation.
We propose to supervise an image segmentation network, tasking it with predicting regions that are likely to contain simple motion patterns.
In the unsupervised video segmentation mode, the network is trained on a collection of unlabelled videos, using the learning process itself as an algorithm to segment these videos.
arXiv Detail & Related papers (2022-05-16T17:55:34Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Temporal Feature Warping for Video Shadow Detection [30.82493923485278]
We propose a simple but powerful method to better aggregate information temporally.
We use an optical flow based warping module to align and then combine features between frames.
We apply this warping module across multiple deep-network layers to retrieve information from neighboring frames including both local details and high-level semantic information.
arXiv Detail & Related papers (2021-07-29T19:12:50Z) - Single Shot Video Object Detector [215.06904478667337]
Single Shot Video Object Detector (SSVD) is a new architecture that novelly integrates feature aggregation into a one-stage detector for object detection in videos.
For $448 times 448$ input, SSVD achieves 79.2% mAP on ImageNet VID dataset.
arXiv Detail & Related papers (2020-07-07T15:36:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.