On the Relevance of Temporal Features for Medical Ultrasound Video
Recognition
- URL: http://arxiv.org/abs/2310.10453v1
- Date: Mon, 16 Oct 2023 14:35:29 GMT
- Title: On the Relevance of Temporal Features for Medical Ultrasound Video
Recognition
- Authors: D. Hudson Smith, John Paul Lineberger, George H. Baker
- Abstract summary: We propose a novel multi-head attention architecture to achieve better sample efficiency on common ultrasound tasks.
We compare the performance of our architecture to an efficient 3D CNN video recognition model in two settings.
These results suggest that expressive time-independent models may be more effective than state-of-the-art video recognition models for some common ultrasound tasks in the low-data regime.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many medical ultrasound video recognition tasks involve identifying key
anatomical features regardless of when they appear in the video suggesting that
modeling such tasks may not benefit from temporal features. Correspondingly,
model architectures that exclude temporal features may have better sample
efficiency. We propose a novel multi-head attention architecture that
incorporates these hypotheses as inductive priors to achieve better sample
efficiency on common ultrasound tasks. We compare the performance of our
architecture to an efficient 3D CNN video recognition model in two settings:
one where we expect not to require temporal features and one where we do. In
the former setting, our model outperforms the 3D CNN - especially when we
artificially limit the training data. In the latter, the outcome reverses.
These results suggest that expressive time-independent models may be more
effective than state-of-the-art video recognition models for some common
ultrasound tasks in the low-data regime.
Related papers
- Leaping Into Memories: Space-Time Deep Feature Synthesis [93.10032043225362]
We propose LEAPS, an architecture-independent method for synthesizing videos from internal models.
We quantitatively and qualitatively evaluate the applicability of LEAPS by inverting a range of architectures convolutional attention-based on Kinetics-400.
arXiv Detail & Related papers (2023-03-17T12:55:22Z) - Unifying Flow, Stereo and Depth Estimation [121.54066319299261]
We present a unified formulation and model for three motion and 3D perception tasks.
We formulate all three tasks as a unified dense correspondence matching problem.
Our model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks.
arXiv Detail & Related papers (2022-11-10T18:59:54Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision.
This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z) - Activity Detection in Long Surgical Videos using Spatio-Temporal Models [1.2400116527089995]
In this paper, we investigate both the state-of-the-art activity recognition and temporal models.
We benchmark these models on a large-scale activity recognition dataset in the operating room with over 800 full-length surgical videos.
We show that even in the case of limited labeled data, we can outperform the existing work by benefiting from models pre-trained on other tasks.
arXiv Detail & Related papers (2022-05-05T17:34:33Z) - Spatio-Temporal Self-Attention Network for Video Saliency Prediction [13.873682190242365]
3D convolutional neural networks have achieved promising results for video tasks in computer vision.
We propose a novel Spatio-Temporal Self-Temporal Self-Attention 3 Network (STSANet) for video saliency prediction.
arXiv Detail & Related papers (2021-08-24T12:52:47Z) - BRAIN2DEPTH: Lightweight CNN Model for Classification of Cognitive
States from EEG Recordings [0.0]
This paper proposes a simple, lightweight CNN model to classify cognitive states from EEG recordings.
We develop a novel pipeline to learn distinct cognitive representation consisting of two stages.
We attain comparable performance utilizing less than 4% of the parameters of other models.
arXiv Detail & Related papers (2021-06-12T05:06:20Z) - Temporal-Spatial Feature Pyramid for Video Saliency Detection [2.578242050187029]
We propose a 3D fully convolutional encoder-decoder architecture for video saliency detection.
Our model is simple yet effective, and can run in real time.
arXiv Detail & Related papers (2021-05-10T09:14:14Z) - On the Post-hoc Explainability of Deep Echo State Networks for Time
Series Forecasting, Image and Video Classification [63.716247731036745]
echo state networks have attracted many stares through time, mainly due to the simplicity and computational efficiency of their learning algorithm.
This work addresses this issue by conducting an explainability study of Echo State Networks when applied to learning tasks with time series, image and video data.
Specifically, the study proposes three different techniques capable of eliciting understandable information about the knowledge grasped by these recurrent models.
arXiv Detail & Related papers (2021-02-17T08:56:33Z) - A Real-time Action Representation with Temporal Encoding and Deep
Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.
T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed.
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.