Detect Any Shadow: Segment Anything for Video Shadow Detection
- URL: http://arxiv.org/abs/2305.16698v2
- Date: Wed, 1 Nov 2023 11:20:57 GMT
- Title: Detect Any Shadow: Segment Anything for Video Shadow Detection
- Authors: Yonghui Wang, Wengang Zhou, Yunyao Mao, Houqiang Li
- Abstract summary: We propose ShadowSAM, a framework for fine-tuning segment anything model (SAM) to detect shadows.
By combining it with long short-term attention mechanism, we extend its capability for efficient video shadow detection.
Our method exhibits accelerated inference speed compared to previous video shadow detection approaches.
- Score: 105.19693622157462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segment anything model (SAM) has achieved great success in the field of
natural image segmentation. Nevertheless, SAM tends to consider shadows as
background and therefore does not perform segmentation on them. In this paper,
we propose ShadowSAM, a simple yet effective framework for fine-tuning SAM to
detect shadows. Besides, by combining it with long short-term attention
mechanism, we extend its capability for efficient video shadow detection.
Specifically, we first fine-tune SAM on ViSha training dataset by utilizing the
bounding boxes obtained from the ground truth shadow mask. Then during the
inference stage, we simulate user interaction by providing bounding boxes to
detect a specific frame (e.g., the first frame). Subsequently, using the
detected shadow mask as a prior, we employ a long short-term network to learn
spatial correlations between distant frames and temporal consistency between
adjacent frames, thereby achieving precise shadow information propagation
across video frames. Extensive experimental results demonstrate the
effectiveness of our method, with notable margin over the state-of-the-art
approaches in terms of MAE and IoU metrics. Moreover, our method exhibits
accelerated inference speed compared to previous video shadow detection
approaches, validating the effectiveness and efficiency of our method. The
source code is now publicly available at
https://github.com/harrytea/Detect-AnyShadow.
Related papers
- Timeline and Boundary Guided Diffusion Network for Video Shadow Detection [22.173407949204137]
Video Shadow Detection (VSD) aims to detect the shadow masks with frame sequence.
Motivated by this, we propose a Timeline and Boundary Guided Diffusion (TBGDiff) network for VSD.
arXiv Detail & Related papers (2024-08-21T17:16:21Z) - SwinShadow: Shifted Window for Ambiguous Adjacent Shadow Detection [90.4751446041017]
We present SwinShadow, a transformer-based architecture that fully utilizes the powerful shifted window mechanism for detecting adjacent shadows.
The whole process can be divided into three parts: encoder, decoder, and feature integration.
Experiments on three shadow detection benchmark datasets, SBU, UCF, and ISTD, demonstrate that our network achieves good performance in terms of balance error rate (BER)
arXiv Detail & Related papers (2024-08-07T03:16:33Z) - Segment Anything Meets Point Tracking [116.44931239508578]
This paper presents a novel method for point-centric interactive video segmentation, empowered by SAM and long-term point tracking.
We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark.
Our experiments on popular video object segmentation and multi-object segmentation tracking benchmarks, including DAVIS, YouTube-VOS, and BDD100K, suggest that a point-based segmentation tracker yields better zero-shot performance and efficient interactions.
arXiv Detail & Related papers (2023-07-03T17:58:01Z) - Tracking by Associating Clips [110.08925274049409]
In this paper, we investigate an alternative by treating object association as clip-wise matching.
Our new perspective views a single long video sequence as multiple short clips, and then the tracking is performed both within and between the clips.
The benefits of this new approach are two folds. First, our method is robust to tracking error accumulation or propagation, as the video chunking allows bypassing the interrupted frames.
Second, the multiple frame information is aggregated during the clip-wise matching, resulting in a more accurate long-range track association than the current frame-wise matching.
arXiv Detail & Related papers (2022-12-20T10:33:17Z) - Learning Shadow Correspondence for Video Shadow Detection [42.1593380820498]
We present a novel Shadow-Consistent Correspondence method (SC-Cor) to enhance pixel-wise similarity of the specific shadow regions across frames for video shadow detection.
SC-Cor is a plug-and-play module that can be easily integrated into existing shadow detectors with no extra computational cost.
Experimental results show that SC-Cor outperforms the prior state-of-the-art method, by 6.51% on IoU and 3.35% on the newly introduced temporal stability metric.
arXiv Detail & Related papers (2022-07-30T06:30:42Z) - Object Propagation via Inter-Frame Attentions for Temporally Stable
Video Instance Segmentation [51.68840525174265]
Video instance segmentation aims to detect, segment, and track objects in a video.
Current approaches extend image-level segmentation algorithms to the temporal domain.
We propose a video instance segmentation method that alleviates the problem due to missing detections.
arXiv Detail & Related papers (2021-11-15T04:15:57Z) - R2D: Learning Shadow Removal to Enhance Fine-Context Shadow Detection [64.10636296274168]
Current shadow detection methods perform poorly when detecting shadow regions that are small, unclear or have blurry edges.
We propose a new method called Restore to Detect (R2D), where a deep neural network is trained for restoration (shadow removal)
We show that our proposed method R2D improves the shadow detection performance while being able to detect fine context better compared to the other recent methods.
arXiv Detail & Related papers (2021-09-20T15:09:22Z) - Temporal Feature Warping for Video Shadow Detection [30.82493923485278]
We propose a simple but powerful method to better aggregate information temporally.
We use an optical flow based warping module to align and then combine features between frames.
We apply this warping module across multiple deep-network layers to retrieve information from neighboring frames including both local details and high-level semantic information.
arXiv Detail & Related papers (2021-07-29T19:12:50Z) - Triple-cooperative Video Shadow Detection [43.030759888063194]
We collect a new video shadow detection dataset, which contains 120 videos with 11, 685 frames, covering 60 object categories, varying lengths, and different motion/lighting conditions.
We also develop a new baseline model, named triple-cooperative video shadow detection network (TVSD-Net)
Within the network, a dual gated co-attention module is proposed to constrain features from neighboring frames in the same video, while an auxiliary similarity loss is introduced to mine semantic information between different videos.
arXiv Detail & Related papers (2021-03-11T08:54:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.