SCOTCH and SODA: A Transformer Video Shadow Detection Framework
- URL: http://arxiv.org/abs/2211.06885v2
- Date: Mon, 27 Mar 2023 02:58:17 GMT
- Title: SCOTCH and SODA: A Transformer Video Shadow Detection Framework
- Authors: Lihao Liu, Jean Prost, Lei Zhu, Nicolas Papadakis, Pietro Li\`o,
Carola-Bibiane Sch\"onlieb, Angelica I Aviles-Rivero
- Abstract summary: Shadows in videos are difficult to detect because of the large shadow deformation between frames.
We introduce the shadow deformation attention trajectory (SODA), a new type of video self-attention module.
We also present a new shadow contrastive learning mechanism (SCOTCH) which aims at guiding the network to learn a unified shadow representation.
- Score: 12.42397422225366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Shadows in videos are difficult to detect because of the large shadow
deformation between frames. In this work, we argue that accounting for shadow
deformation is essential when designing a video shadow detection method. To
this end, we introduce the shadow deformation attention trajectory (SODA), a
new type of video self-attention module, specially designed to handle the large
shadow deformations in videos. Moreover, we present a new shadow contrastive
learning mechanism (SCOTCH) which aims at guiding the network to learn a
unified shadow representation from massive positive shadow pairs across
different videos. We demonstrate empirically the effectiveness of our two
contributions in an ablation study. Furthermore, we show that SCOTCH and SODA
significantly outperforms existing techniques for video shadow detection. Code
is available at the project page:
https://lihaoliu-cambridge.github.io/scotch_and_soda/
Related papers
- RenDetNet: Weakly-supervised Shadow Detection with Shadow Caster Verification [15.68136544586505]
Existing shadow detection models struggle to differentiate dark image areas from shadows.
In this paper, we tackle this issue by verifying that all detected shadows are real, i.e. they have paired shadow casters.
We perform this step in a physically-accurate manner by differentiably re-rendering the scene and observing the changes stemming from carving out estimated shadow casters.
Thanks to this approach, the RenDetNet proposed in this paper is the first learning-based shadow detection model whose supervisory signals can be computed in a self-supervised manner.
arXiv Detail & Related papers (2024-08-30T09:34:36Z) - Timeline and Boundary Guided Diffusion Network for Video Shadow Detection [22.173407949204137]
Video Shadow Detection (VSD) aims to detect the shadow masks with frame sequence.
Motivated by this, we propose a Timeline and Boundary Guided Diffusion (TBGDiff) network for VSD.
arXiv Detail & Related papers (2024-08-21T17:16:21Z) - SwinShadow: Shifted Window for Ambiguous Adjacent Shadow Detection [90.4751446041017]
We present SwinShadow, a transformer-based architecture that fully utilizes the powerful shifted window mechanism for detecting adjacent shadows.
The whole process can be divided into three parts: encoder, decoder, and feature integration.
Experiments on three shadow detection benchmark datasets, SBU, UCF, and ISTD, demonstrate that our network achieves good performance in terms of balance error rate (BER)
arXiv Detail & Related papers (2024-08-07T03:16:33Z) - Detect Any Shadow: Segment Anything for Video Shadow Detection [105.19693622157462]
We propose ShadowSAM, a framework for fine-tuning segment anything model (SAM) to detect shadows.
By combining it with long short-term attention mechanism, we extend its capability for efficient video shadow detection.
Our method exhibits accelerated inference speed compared to previous video shadow detection approaches.
arXiv Detail & Related papers (2023-05-26T07:39:10Z) - ShaDocNet: Learning Spatial-Aware Tokens in Transformer for Document
Shadow Removal [53.01990632289937]
We propose a Transformer-based model for document shadow removal.
It uses shadow context encoding and decoding in both shadow and shadow-free regions.
arXiv Detail & Related papers (2022-11-30T01:46:29Z) - Video Instance Shadow Detection Under the Sun and Sky [81.95848151121739]
ViShadow is a semi-supervised video instance shadow detection framework.
It identifies shadow and object instances through contrastive learning for cross-frame pairing.
A retrieval mechanism is introduced to manage temporary disappearances.
arXiv Detail & Related papers (2022-11-23T10:20:19Z) - DeS3: Adaptive Attention-driven Self and Soft Shadow Removal using ViT Similarity [54.831083157152136]
We present a method that removes hard, soft and self shadows based on adaptive attention and ViT similarity.
Our method outperforms state-of-the-art methods on the SRD, AISTD, LRSS, USR and UIUC datasets.
arXiv Detail & Related papers (2022-11-15T12:15:29Z) - Learning Shadow Correspondence for Video Shadow Detection [42.1593380820498]
We present a novel Shadow-Consistent Correspondence method (SC-Cor) to enhance pixel-wise similarity of the specific shadow regions across frames for video shadow detection.
SC-Cor is a plug-and-play module that can be easily integrated into existing shadow detectors with no extra computational cost.
Experimental results show that SC-Cor outperforms the prior state-of-the-art method, by 6.51% on IoU and 3.35% on the newly introduced temporal stability metric.
arXiv Detail & Related papers (2022-07-30T06:30:42Z) - R2D: Learning Shadow Removal to Enhance Fine-Context Shadow Detection [64.10636296274168]
Current shadow detection methods perform poorly when detecting shadow regions that are small, unclear or have blurry edges.
We propose a new method called Restore to Detect (R2D), where a deep neural network is trained for restoration (shadow removal)
We show that our proposed method R2D improves the shadow detection performance while being able to detect fine context better compared to the other recent methods.
arXiv Detail & Related papers (2021-09-20T15:09:22Z) - Temporal Feature Warping for Video Shadow Detection [30.82493923485278]
We propose a simple but powerful method to better aggregate information temporally.
We use an optical flow based warping module to align and then combine features between frames.
We apply this warping module across multiple deep-network layers to retrieve information from neighboring frames including both local details and high-level semantic information.
arXiv Detail & Related papers (2021-07-29T19:12:50Z) - Triple-cooperative Video Shadow Detection [43.030759888063194]
We collect a new video shadow detection dataset, which contains 120 videos with 11, 685 frames, covering 60 object categories, varying lengths, and different motion/lighting conditions.
We also develop a new baseline model, named triple-cooperative video shadow detection network (TVSD-Net)
Within the network, a dual gated co-attention module is proposed to constrain features from neighboring frames in the same video, while an auxiliary similarity loss is introduced to mine semantic information between different videos.
arXiv Detail & Related papers (2021-03-11T08:54:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.