MMSummary: Multimodal Summary Generation for Fetal Ultrasound Video
- URL: http://arxiv.org/abs/2408.03761v2
- Date: Wed, 30 Oct 2024 12:08:08 GMT
- Title: MMSummary: Multimodal Summary Generation for Fetal Ultrasound Video
- Authors: Xiaoqing Guo, Qianhui Men, J. Alison Noble,
- Abstract summary: We present the first automated multimodal generation, MMSummary, for medical imaging video, particularly with a focus on fetal ultrasound analysis.
MMSummary is designed as a three-stage pipeline, progressing from anatomy detection to captioning and finally segmentation and measurement.
Based on reported experiments is estimated to reduce scanning time by approximately 31.5%, thereby suggesting the potential to enhance workflow efficiency.
- Score: 13.231546105751015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the first automated multimodal summary generation system, MMSummary, for medical imaging video, particularly with a focus on fetal ultrasound analysis. Imitating the examination process performed by a human sonographer, MMSummary is designed as a three-stage pipeline, progressing from keyframe detection to keyframe captioning and finally anatomy segmentation and measurement. In the keyframe detection stage, an innovative automated workflow is proposed to progressively select a concise set of keyframes, preserving sufficient video information without redundancy. Subsequently, we adapt a large language model to generate meaningful captions for fetal ultrasound keyframes in the keyframe captioning stage. If a keyframe is captioned as fetal biometry, the segmentation and measurement stage estimates biometric parameters by segmenting the region of interest according to the textual prior. The MMSummary system provides comprehensive summaries for fetal ultrasound examinations and based on reported experiments is estimated to reduce scanning time by approximately 31.5%, thereby suggesting the potential to enhance clinical workflow efficiency.
Related papers
- Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis [53.809054774037214]
This paper proposes leveraging vision-language pretraining on bone X-rays paired with French reports.
It is the first study to integrate French reports to shape the embedding space devoted to bone X-Rays representations.
arXiv Detail & Related papers (2024-05-14T19:53:20Z) - Breast Ultrasound Report Generation using LangChain [58.07183284468881]
We propose the integration of multiple image analysis tools through a LangChain using Large Language Models (LLM) into the breast reporting process.
Our method can accurately extract relevant features from ultrasound images, interpret them in a clinical context, and produce comprehensive and standardized reports.
arXiv Detail & Related papers (2023-12-05T00:28:26Z) - Multi-Task Learning Approach for Unified Biometric Estimation from Fetal
Ultrasound Anomaly Scans [0.8213829427624407]
We propose a multi-task learning approach to classify the region into head, abdomen and femur.
We were able to achieve a mean absolute error (MAE) of 1.08 mm on head circumference, 1.44 mm on abdomen circumference and 1.10 mm on femur length with a classification accuracy of 99.91%.
arXiv Detail & Related papers (2023-11-16T06:35:02Z) - Joint Depth Prediction and Semantic Segmentation with Multi-View SAM [59.99496827912684]
We propose a Multi-View Stereo (MVS) technique for depth prediction that benefits from rich semantic features of the Segment Anything Model (SAM)
This enhanced depth prediction, in turn, serves as a prompt to our Transformer-based semantic segmentation decoder.
arXiv Detail & Related papers (2023-10-31T20:15:40Z) - Weakly-Supervised Surgical Phase Recognition [19.27227976291303]
In this work we join concepts of graph segmentation with self-supervised learning to derive a random-walk solution for per-frame phase prediction.
We validate our method by running experiments with the public Cholec80 dataset of laparoscopic cholecystectomy videos.
arXiv Detail & Related papers (2023-10-26T07:54:47Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - Global Multi-modal 2D/3D Registration via Local Descriptors Learning [0.3299877799532224]
We present a novel approach to solve the problem of registration of an ultrasound sweep to a pre-operative image.
We learn dense keypoint descriptors from which we then estimate the registration.
Our approach is evaluated on a clinical dataset of paired MR volumes and ultrasound sequences.
arXiv Detail & Related papers (2022-05-06T18:24:19Z) - Deep Learning for Ultrasound Beamforming [120.12255978513912]
Beamforming, the process of mapping received ultrasound echoes to the spatial image domain, lies at the heart of the ultrasound image formation chain.
Modern ultrasound imaging leans heavily on innovations in powerful digital receive channel processing.
Deep learning methods can play a compelling role in the digital beamforming pipeline.
arXiv Detail & Related papers (2021-09-23T15:15:21Z) - Unsupervised multi-latent space reinforcement learning framework for
video summarization in ultrasound imaging [0.0]
The COVID-19 pandemic has highlighted the need for a tool to speed up triage in ultrasound scans.
The proposed video-summarization technique is a step in this direction.
We propose a new unsupervised reinforcement learning framework with novel rewards.
arXiv Detail & Related papers (2021-09-03T04:50:35Z) - FetalNet: Multi-task deep learning framework for fetal ultrasound
biometric measurements [11.364211664829567]
We propose an end-to-end multi-task neural network called FetalNet with an attention mechanism and stacked module for fetal ultrasound scan video analysis.
The main goal in fetal ultrasound video analysis is to find proper standard planes to measure the fetal head, abdomen and femur.
Our method called FetalNet outperforms existing state-of-the-art methods in both classification and segmentation in fetal ultrasound video recordings.
arXiv Detail & Related papers (2021-07-14T19:13:33Z) - Hybrid Attention for Automatic Segmentation of Whole Fetal Head in
Prenatal Ultrasound Volumes [52.53375964591765]
We propose the first fully-automated solution to segment the whole fetal head in US volumes.
The segmentation task is firstly formulated as an end-to-end volumetric mapping under an encoder-decoder deep architecture.
We then combine the segmentor with a proposed hybrid attention scheme (HAS) to select discriminative features and suppress the non-informative volumetric features.
arXiv Detail & Related papers (2020-04-28T14:43:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.