A Spatial-Temporal Deformable Attention based Framework for Breast
Lesion Detection in Videos
- URL: http://arxiv.org/abs/2309.04702v1
- Date: Sat, 9 Sep 2023 07:00:10 GMT
- Title: A Spatial-Temporal Deformable Attention based Framework for Breast
Lesion Detection in Videos
- Authors: Chao Qin and Jiale Cao and Huazhu Fu and Rao Muhammad Anwer and Fahad
Shahbaz Khan
- Abstract summary: We propose a spatial-temporal deformable attention based framework, named STNet.
Our STNet introduces a spatial-temporal deformable attention module to perform local spatial-temporal feature fusion.
Experiments on the public breast lesion ultrasound video dataset show that our STNet obtains a state-of-the-art detection performance.
- Score: 107.96514633713034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting breast lesion in videos is crucial for computer-aided diagnosis.
Existing video-based breast lesion detection approaches typically perform
temporal feature aggregation of deep backbone features based on the
self-attention operation. We argue that such a strategy struggles to
effectively perform deep feature aggregation and ignores the useful local
information. To tackle these issues, we propose a spatial-temporal deformable
attention based framework, named STNet. Our STNet introduces a spatial-temporal
deformable attention module to perform local spatial-temporal feature fusion.
The spatial-temporal deformable attention module enables deep feature
aggregation in each stage of both encoder and decoder. To further accelerate
the detection speed, we introduce an encoder feature shuffle strategy for
multi-frame prediction during inference. In our encoder feature shuffle
strategy, we share the backbone and encoder features, and shuffle encoder
features for decoder to generate the predictions of multiple frames. The
experiments on the public breast lesion ultrasound video dataset show that our
STNet obtains a state-of-the-art detection performance, while operating twice
as fast inference speed. The code and model are available at
https://github.com/AlfredQin/STNet.
Related papers
- Skeleton-Guided Spatial-Temporal Feature Learning for Video-Based Visible-Infrared Person Re-Identification [2.623742123778503]
Video-based visible-infrared person re-identification (VVI-ReID) is challenging due to significant modality feature discrepancies.
We propose a novel Skeleton-guided spatial-temporal feAture leaRning (STAR) method for VVI-ReID.
arXiv Detail & Related papers (2024-11-17T13:18:05Z) - TSdetector: Temporal-Spatial Self-correction Collaborative Learning for Colonoscopy Video Detection [19.00902297385955]
We propose a novel Temporal-Spatial self-correction detector (TSdetector), which integrates temporal-level consistency learning and spatial-level reliability learning to detect objects continuously.
The experimental results on three publicly available polyp video dataset show that TSdetector achieves the highest polyp detection rate and outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-09-30T06:19:29Z) - Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - Point Cloud Video Anomaly Detection Based on Point Spatio-Temporal
Auto-Encoder [1.4340883856076097]
We propose Point Spatio-Temporal Auto-Encoder (PSTAE), an autoencoder framework that uses point cloud videos as input to detect anomalies in point cloud videos.
Our method sets a new state-of-the-art (SOTA) on the TIMo dataset.
arXiv Detail & Related papers (2023-06-04T10:30:28Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - Pedestrian Spatio-Temporal Information Fusion For Video Anomaly
Detection [1.5736899098702974]
An anomaly detection method is proposed to integrate the information of pedestrians.
Anomaly detection is realized according to the difference between the output frame and the true value.
The experimental results on the CUHK Avenue and ShanghaiTech datasets show that the proposed method is superior to the current mainstream video anomaly detection methods.
arXiv Detail & Related papers (2022-11-18T06:41:02Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Deep Video Inpainting Detection [95.36819088529622]
Video inpainting detection localizes an inpainted region in a video both spatially and temporally.
VIDNet, Video Inpainting Detection Network, contains a two-stream encoder-decoder architecture with attention module.
arXiv Detail & Related papers (2021-01-26T20:53:49Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.