A Spatial-Temporal Deformable Attention based Framework for Breast
Lesion Detection in Videos
- URL: http://arxiv.org/abs/2309.04702v1
- Date: Sat, 9 Sep 2023 07:00:10 GMT
- Title: A Spatial-Temporal Deformable Attention based Framework for Breast
Lesion Detection in Videos
- Authors: Chao Qin and Jiale Cao and Huazhu Fu and Rao Muhammad Anwer and Fahad
Shahbaz Khan
- Abstract summary: We propose a spatial-temporal deformable attention based framework, named STNet.
Our STNet introduces a spatial-temporal deformable attention module to perform local spatial-temporal feature fusion.
Experiments on the public breast lesion ultrasound video dataset show that our STNet obtains a state-of-the-art detection performance.
- Score: 107.96514633713034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting breast lesion in videos is crucial for computer-aided diagnosis.
Existing video-based breast lesion detection approaches typically perform
temporal feature aggregation of deep backbone features based on the
self-attention operation. We argue that such a strategy struggles to
effectively perform deep feature aggregation and ignores the useful local
information. To tackle these issues, we propose a spatial-temporal deformable
attention based framework, named STNet. Our STNet introduces a spatial-temporal
deformable attention module to perform local spatial-temporal feature fusion.
The spatial-temporal deformable attention module enables deep feature
aggregation in each stage of both encoder and decoder. To further accelerate
the detection speed, we introduce an encoder feature shuffle strategy for
multi-frame prediction during inference. In our encoder feature shuffle
strategy, we share the backbone and encoder features, and shuffle encoder
features for decoder to generate the predictions of multiple frames. The
experiments on the public breast lesion ultrasound video dataset show that our
STNet obtains a state-of-the-art detection performance, while operating twice
as fast inference speed. The code and model are available at
https://github.com/AlfredQin/STNet.
Related papers
- Point Cloud Video Anomaly Detection Based on Point Spatio-Temporal
Auto-Encoder [1.4340883856076097]
We propose Point Spatio-Temporal Auto-Encoder (PSTAE), an autoencoder framework that uses point cloud videos as input to detect anomalies in point cloud videos.
Our method sets a new state-of-the-art (SOTA) on the TIMo dataset.
arXiv Detail & Related papers (2023-06-04T10:30:28Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - Pedestrian Spatio-Temporal Information Fusion For Video Anomaly
Detection [1.5736899098702974]
An anomaly detection method is proposed to integrate the information of pedestrians.
Anomaly detection is realized according to the difference between the output frame and the true value.
The experimental results on the CUHK Avenue and ShanghaiTech datasets show that the proposed method is superior to the current mainstream video anomaly detection methods.
arXiv Detail & Related papers (2022-11-18T06:41:02Z) - A New Dataset and A Baseline Model for Breast Lesion Detection in
Ultrasound Videos [43.42513012531214]
We first collect and annotate an ultrasound video dataset (188 videos) for breast lesion detection.
We propose a clip-level and video-level feature aggregated network (CVA-Net) for addressing breast lesion detection in ultrasound videos.
arXiv Detail & Related papers (2022-07-01T01:37:50Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Deep Video Inpainting Detection [95.36819088529622]
Video inpainting detection localizes an inpainted region in a video both spatially and temporally.
VIDNet, Video Inpainting Detection Network, contains a two-stream encoder-decoder architecture with attention module.
arXiv Detail & Related papers (2021-01-26T20:53:49Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - Multiple Instance-Based Video Anomaly Detection using Deep Temporal
Encoding-Decoding [5.255783459833821]
We propose a weakly supervised deep temporal encoding-decoding solution for anomaly detection in surveillance videos.
The proposed approach uses both abnormal and normal video clips during the training phase.
The results show that the proposed method performs similar to or better than the state-of-the-art solutions for anomaly detection in video surveillance applications.
arXiv Detail & Related papers (2020-07-03T08:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.