DTTNet: Improving Video Shadow Detection via Dark-Aware Guidance and Tokenized Temporal Modeling
- URL: http://arxiv.org/abs/2511.06925v1
- Date: Mon, 10 Nov 2025 10:18:26 GMT
- Title: DTTNet: Improving Video Shadow Detection via Dark-Aware Guidance and Tokenized Temporal Modeling
- Authors: Zhicheng Li, Kunyang Sun, Rui Yao, Hancheng Zhu, Fuyuan Hu, Jiaqi Zhao, Zhiwen Shao, Yong Zhou,
- Abstract summary: Video shadow detection confronts two difficulties: distinguishing shadows from complex backgrounds and modeling dynamic shadow deformations under varying illumination.<n>To address shadow-background ambiguity, we leverage linguistic priors through the proposed Vision Match Module (VMM) and a Dark-aware Semantic Block (DSB)<n>For temporal variable shadow shapes, we propose a Tokenized Temporal Block (TTB) that decouplestemporal learning.
- Score: 37.33167473664897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video shadow detection confronts two entwined difficulties: distinguishing shadows from complex backgrounds and modeling dynamic shadow deformations under varying illumination. To address shadow-background ambiguity, we leverage linguistic priors through the proposed Vision-language Match Module (VMM) and a Dark-aware Semantic Block (DSB), extracting text-guided features to explicitly differentiate shadows from dark objects. Furthermore, we introduce adaptive mask reweighting to downweight penumbra regions during training and apply edge masks at the final decoder stage for better supervision. For temporal modeling of variable shadow shapes, we propose a Tokenized Temporal Block (TTB) that decouples spatiotemporal learning. TTB summarizes cross-frame shadow semantics into learnable temporal tokens, enabling efficient sequence encoding with minimal computation overhead. Comprehensive Experiments on multiple benchmark datasets demonstrate state-of-the-art accuracy and real-time inference efficiency. Codes are available at https://github.com/city-cheng/DTTNet.
Related papers
- Retinex-guided Histogram Transformer for Mask-free Shadow Removal [12.962534359029103]
ReHiT is an efficient mask-free shadow removal framework based on a hybrid CNN-Transformer architecture guided by Retinex theory.<n>Our solution delivers competitive results with one of the smallest parameter sizes and fastest inference speeds among top-ranked entries.
arXiv Detail & Related papers (2025-04-18T22:19:40Z) - MetaShadow: Object-Centered Shadow Detection, Removal, and Synthesis [64.00425120075045]
Shadows are often under-considered or even ignored in image editing applications, limiting the realism of the edited results.<n>In this paper, we introduce MetaShadow, a three-in-one versatile framework that enables detection, removal, and controllable synthesis of shadows in natural images in an object-centered fashion.
arXiv Detail & Related papers (2024-12-03T18:04:42Z) - Test-Time Intensity Consistency Adaptation for Shadow Detection [35.03354405371279]
TICA is a novel framework that leverages light-intensity information during test-time adaptation to enhance shadow detection accuracy.
A basic encoder-decoder model is initially trained on a labeled dataset for shadow detection.
During the testing phase, the network is adjusted for each test sample by enforcing consistent intensity predictions.
arXiv Detail & Related papers (2024-10-10T08:08:32Z) - Timeline and Boundary Guided Diffusion Network for Video Shadow Detection [22.173407949204137]
Video Shadow Detection (VSD) aims to detect the shadow masks with frame sequence.
Motivated by this, we propose a Timeline and Boundary Guided Diffusion (TBGDiff) network for VSD.
arXiv Detail & Related papers (2024-08-21T17:16:21Z) - SwinShadow: Shifted Window for Ambiguous Adjacent Shadow Detection [90.4751446041017]
We present SwinShadow, a transformer-based architecture that fully utilizes the powerful shifted window mechanism for detecting adjacent shadows.
The whole process can be divided into three parts: encoder, decoder, and feature integration.
Experiments on three shadow detection benchmark datasets, SBU, UCF, and ISTD, demonstrate that our network achieves good performance in terms of balance error rate (BER)
arXiv Detail & Related papers (2024-08-07T03:16:33Z) - Controllable Shadow Generation Using Pixel Height Maps [58.59256060452418]
Physics-based shadow rendering methods require 3D geometries, which are not always available.
Deep learning-based shadow synthesis methods learn a mapping from the light information to an object's shadow without explicitly modeling the shadow geometry.
We introduce pixel heigh, a novel geometry representation that encodes the correlations between objects, ground, and camera pose.
arXiv Detail & Related papers (2022-07-12T08:29:51Z) - Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment.
In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation.
We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z) - R2D: Learning Shadow Removal to Enhance Fine-Context Shadow Detection [64.10636296274168]
Current shadow detection methods perform poorly when detecting shadow regions that are small, unclear or have blurry edges.
We propose a new method called Restore to Detect (R2D), where a deep neural network is trained for restoration (shadow removal)
We show that our proposed method R2D improves the shadow detection performance while being able to detect fine context better compared to the other recent methods.
arXiv Detail & Related papers (2021-09-20T15:09:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.