Detect Any Shadow: Segment Anything for Video Shadow Detection
- URL: http://arxiv.org/abs/2305.16698v2
- Date: Wed, 1 Nov 2023 11:20:57 GMT
- Title: Detect Any Shadow: Segment Anything for Video Shadow Detection
- Authors: Yonghui Wang, Wengang Zhou, Yunyao Mao, Houqiang Li
- Abstract summary: We propose ShadowSAM, a framework for fine-tuning segment anything model (SAM) to detect shadows.
By combining it with long short-term attention mechanism, we extend its capability for efficient video shadow detection.
Our method exhibits accelerated inference speed compared to previous video shadow detection approaches.
- Score: 105.19693622157462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segment anything model (SAM) has achieved great success in the field of
natural image segmentation. Nevertheless, SAM tends to consider shadows as
background and therefore does not perform segmentation on them. In this paper,
we propose ShadowSAM, a simple yet effective framework for fine-tuning SAM to
detect shadows. Besides, by combining it with long short-term attention
mechanism, we extend its capability for efficient video shadow detection.
Specifically, we first fine-tune SAM on ViSha training dataset by utilizing the
bounding boxes obtained from the ground truth shadow mask. Then during the
inference stage, we simulate user interaction by providing bounding boxes to
detect a specific frame (e.g., the first frame). Subsequently, using the
detected shadow mask as a prior, we employ a long short-term network to learn
spatial correlations between distant frames and temporal consistency between
adjacent frames, thereby achieving precise shadow information propagation
across video frames. Extensive experimental results demonstrate the
effectiveness of our method, with notable margin over the state-of-the-art
approaches in terms of MAE and IoU metrics. Moreover, our method exhibits
accelerated inference speed compared to previous video shadow detection
approaches, validating the effectiveness and efficiency of our method. The
source code is now publicly available at
https://github.com/harrytea/Detect-AnyShadow.
Related papers
- Segment Anything Meets Point Tracking [116.44931239508578]
This paper presents a novel method for point-centric interactive video segmentation, empowered by SAM and long-term point tracking.
We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark.
Our experiments on popular video object segmentation and multi-object segmentation tracking benchmarks, including DAVIS, YouTube-VOS, and BDD100K, suggest that a point-based segmentation tracker yields better zero-shot performance and efficient interactions.
arXiv Detail & Related papers (2023-07-03T17:58:01Z) - SAM-helps-Shadow:When Segment Anything Model meet shadow removal [8.643096072885909]
In this study, we innovatively adapted the SAM (Segment anything model) for shadow removal by introducing SAM-helps-Shadow.
Our approach utilized the model's detection results as a potent prior for facilitating shadow detection, followed by shadow removal using a second-order deep unfolding network.
arXiv Detail & Related papers (2023-06-01T06:37:19Z) - Tracking by Associating Clips [110.08925274049409]
In this paper, we investigate an alternative by treating object association as clip-wise matching.
Our new perspective views a single long video sequence as multiple short clips, and then the tracking is performed both within and between the clips.
The benefits of this new approach are two folds. First, our method is robust to tracking error accumulation or propagation, as the video chunking allows bypassing the interrupted frames.
Second, the multiple frame information is aggregated during the clip-wise matching, resulting in a more accurate long-range track association than the current frame-wise matching.
arXiv Detail & Related papers (2022-12-20T10:33:17Z) - Learning Shadow Correspondence for Video Shadow Detection [42.1593380820498]
We present a novel Shadow-Consistent Correspondence method (SC-Cor) to enhance pixel-wise similarity of the specific shadow regions across frames for video shadow detection.
SC-Cor is a plug-and-play module that can be easily integrated into existing shadow detectors with no extra computational cost.
Experimental results show that SC-Cor outperforms the prior state-of-the-art method, by 6.51% on IoU and 3.35% on the newly introduced temporal stability metric.
arXiv Detail & Related papers (2022-07-30T06:30:42Z) - SiamMask: A Framework for Fast Online Object Tracking and Segmentation [96.61632757952292]
SiamMask is a framework to perform both visual object tracking and video object segmentation, in real-time, with the same simple method.
We show that it is possible to extend the framework to handle multiple object tracking and segmentation by simply re-using the multi-task model.
It yields real-time state-of-the-art results on visual-object tracking benchmarks, while at the same time demonstrating competitive performance at a high speed for video object segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T14:47:17Z) - Object Propagation via Inter-Frame Attentions for Temporally Stable
Video Instance Segmentation [51.68840525174265]
Video instance segmentation aims to detect, segment, and track objects in a video.
Current approaches extend image-level segmentation algorithms to the temporal domain.
We propose a video instance segmentation method that alleviates the problem due to missing detections.
arXiv Detail & Related papers (2021-11-15T04:15:57Z) - R2D: Learning Shadow Removal to Enhance Fine-Context Shadow Detection [64.10636296274168]
Current shadow detection methods perform poorly when detecting shadow regions that are small, unclear or have blurry edges.
We propose a new method called Restore to Detect (R2D), where a deep neural network is trained for restoration (shadow removal)
We show that our proposed method R2D improves the shadow detection performance while being able to detect fine context better compared to the other recent methods.
arXiv Detail & Related papers (2021-09-20T15:09:22Z) - Temporal Feature Warping for Video Shadow Detection [30.82493923485278]
We propose a simple but powerful method to better aggregate information temporally.
We use an optical flow based warping module to align and then combine features between frames.
We apply this warping module across multiple deep-network layers to retrieve information from neighboring frames including both local details and high-level semantic information.
arXiv Detail & Related papers (2021-07-29T19:12:50Z) - Triple-cooperative Video Shadow Detection [43.030759888063194]
We collect a new video shadow detection dataset, which contains 120 videos with 11, 685 frames, covering 60 object categories, varying lengths, and different motion/lighting conditions.
We also develop a new baseline model, named triple-cooperative video shadow detection network (TVSD-Net)
Within the network, a dual gated co-attention module is proposed to constrain features from neighboring frames in the same video, while an auxiliary similarity loss is introduced to mine semantic information between different videos.
arXiv Detail & Related papers (2021-03-11T08:54:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.