MMSummary: Multimodal Summary Generation for Fetal Ultrasound Video
- URL: http://arxiv.org/abs/2408.03761v2
- Date: Wed, 30 Oct 2024 12:08:08 GMT
- Title: MMSummary: Multimodal Summary Generation for Fetal Ultrasound Video
- Authors: Xiaoqing Guo, Qianhui Men, J. Alison Noble,
- Abstract summary: We present the first automated multimodal generation, MMSummary, for medical imaging video, particularly with a focus on fetal ultrasound analysis.
MMSummary is designed as a three-stage pipeline, progressing from anatomy detection to captioning and finally segmentation and measurement.
Based on reported experiments is estimated to reduce scanning time by approximately 31.5%, thereby suggesting the potential to enhance workflow efficiency.
- Score: 13.231546105751015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the first automated multimodal summary generation system, MMSummary, for medical imaging video, particularly with a focus on fetal ultrasound analysis. Imitating the examination process performed by a human sonographer, MMSummary is designed as a three-stage pipeline, progressing from keyframe detection to keyframe captioning and finally anatomy segmentation and measurement. In the keyframe detection stage, an innovative automated workflow is proposed to progressively select a concise set of keyframes, preserving sufficient video information without redundancy. Subsequently, we adapt a large language model to generate meaningful captions for fetal ultrasound keyframes in the keyframe captioning stage. If a keyframe is captioned as fetal biometry, the segmentation and measurement stage estimates biometric parameters by segmenting the region of interest according to the textual prior. The MMSummary system provides comprehensive summaries for fetal ultrasound examinations and based on reported experiments is estimated to reduce scanning time by approximately 31.5%, thereby suggesting the potential to enhance clinical workflow efficiency.
Related papers
- Variable-frame CNNLSTM for Breast Nodule Classification using Ultrasound Videos [22.437678884189697]
This study proposes a novel video classification method based on CNN and LSTM.
It reduces CNN-extracted image features to 1x512 dimension, followed by sorting and compressing feature vectors for LSTM training.
Experimental results demonstrate that our variable-frame CNNLSTM method outperforms other approaches across all metrics.
arXiv Detail & Related papers (2025-02-17T06:35:37Z) - REMOTE: Real-time Ego-motion Tracking for Various Endoscopes via Multimodal Visual Feature Learning [0.7499722271664147]
A novel framework is proposed to perform real-time ego-motion tracking for endoscope.
A multi-modal visual feature learning network is proposed to perform relative pose prediction.
The absolute pose of endoscope is calculated based on relative poses.
arXiv Detail & Related papers (2025-01-30T03:58:41Z) - Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Fold Paralysis [9.530028450239394]
The system integrates video-based glottis detection with an audio keyword spotting method to analyze both video and audio data.
MLVAS features an advanced strobing video extraction module that specifically identifies strobing frames from laryngeal videostroboscopy.
arXiv Detail & Related papers (2024-09-05T14:56:38Z) - Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis [53.809054774037214]
This paper proposes leveraging vision-language pretraining on bone X-rays paired with French reports.
It is the first study to integrate French reports to shape the embedding space devoted to bone X-Rays representations.
arXiv Detail & Related papers (2024-05-14T19:53:20Z) - Breast Ultrasound Report Generation using LangChain [58.07183284468881]
We propose the integration of multiple image analysis tools through a LangChain using Large Language Models (LLM) into the breast reporting process.
Our method can accurately extract relevant features from ultrasound images, interpret them in a clinical context, and produce comprehensive and standardized reports.
arXiv Detail & Related papers (2023-12-05T00:28:26Z) - Joint Depth Prediction and Semantic Segmentation with Multi-View SAM [59.99496827912684]
We propose a Multi-View Stereo (MVS) technique for depth prediction that benefits from rich semantic features of the Segment Anything Model (SAM)
This enhanced depth prediction, in turn, serves as a prompt to our Transformer-based semantic segmentation decoder.
arXiv Detail & Related papers (2023-10-31T20:15:40Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - Global Multi-modal 2D/3D Registration via Local Descriptors Learning [0.3299877799532224]
We present a novel approach to solve the problem of registration of an ultrasound sweep to a pre-operative image.
We learn dense keypoint descriptors from which we then estimate the registration.
Our approach is evaluated on a clinical dataset of paired MR volumes and ultrasound sequences.
arXiv Detail & Related papers (2022-05-06T18:24:19Z) - Deep Learning for Ultrasound Beamforming [120.12255978513912]
Beamforming, the process of mapping received ultrasound echoes to the spatial image domain, lies at the heart of the ultrasound image formation chain.
Modern ultrasound imaging leans heavily on innovations in powerful digital receive channel processing.
Deep learning methods can play a compelling role in the digital beamforming pipeline.
arXiv Detail & Related papers (2021-09-23T15:15:21Z) - FetalNet: Multi-task deep learning framework for fetal ultrasound
biometric measurements [11.364211664829567]
We propose an end-to-end multi-task neural network called FetalNet with an attention mechanism and stacked module for fetal ultrasound scan video analysis.
The main goal in fetal ultrasound video analysis is to find proper standard planes to measure the fetal head, abdomen and femur.
Our method called FetalNet outperforms existing state-of-the-art methods in both classification and segmentation in fetal ultrasound video recordings.
arXiv Detail & Related papers (2021-07-14T19:13:33Z) - Hybrid Attention for Automatic Segmentation of Whole Fetal Head in
Prenatal Ultrasound Volumes [52.53375964591765]
We propose the first fully-automated solution to segment the whole fetal head in US volumes.
The segmentation task is firstly formulated as an end-to-end volumetric mapping under an encoder-decoder deep architecture.
We then combine the segmentor with a proposed hybrid attention scheme (HAS) to select discriminative features and suppress the non-informative volumetric features.
arXiv Detail & Related papers (2020-04-28T14:43:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.