Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language
Segmentation in Echocardiography
- URL: http://arxiv.org/abs/2309.12829v1
- Date: Fri, 22 Sep 2023 12:36:30 GMT
- Title: Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language
Segmentation in Echocardiography
- Authors: Rabin Adhikari, Manish Dhakal, Safal Thapaliya, Kanchan Poudel,
Prasiddha Bhandari, Bishesh Khanal
- Abstract summary: Vision-Language Models (VLSMs) can incorporate rich contextual information, potentially aiding accurate and explainable segmentation.
In this study, we explore using synthetic datasets from Semantic Diffusion Models (SDMs) to enhance VLSMs for echocardiography segmentation.
Our results show improved metrics and faster convergence when pretraining VLSMs on SDM-generated synthetic images before finetuning on real images.
- Score: 0.9324036842528547
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate segmentation is essential for echocardiography-based assessment of
cardiovascular diseases (CVDs). However, the variability among sonographers and
the inherent challenges of ultrasound images hinder precise segmentation. By
leveraging the joint representation of image and text modalities,
Vision-Language Segmentation Models (VLSMs) can incorporate rich contextual
information, potentially aiding in accurate and explainable segmentation.
However, the lack of readily available data in echocardiography hampers the
training of VLSMs. In this study, we explore using synthetic datasets from
Semantic Diffusion Models (SDMs) to enhance VLSMs for echocardiography
segmentation. We evaluate results for two popular VLSMs (CLIPSeg and CRIS)
using seven different kinds of language prompts derived from several
attributes, automatically extracted from echocardiography images, segmentation
masks, and their metadata. Our results show improved metrics and faster
convergence when pretraining VLSMs on SDM-generated synthetic images before
finetuning on real images. The code, configs, and prompts are available at
https://github.com/naamiinepal/synthetic-boost.
Related papers
- Vision-Language Synthetic Data Enhances Echocardiography Downstream Tasks [4.1942958779358674]
This paper utilizes recent vision-language models to produce diverse and realistic synthetic echocardiography image data.
We show that the rich contextual information present in the synthesized data potentially enhances the accuracy and interpretability of downstream tasks.
arXiv Detail & Related papers (2024-03-28T23:26:45Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - CMRxRecon: An open cardiac MRI dataset for the competition of
accelerated image reconstruction [62.61209705638161]
There has been growing interest in deep learning-based CMR imaging algorithms.
Deep learning methods require large training datasets.
This dataset includes multi-contrast, multi-view, multi-slice and multi-coil CMR imaging data from 300 subjects.
arXiv Detail & Related papers (2023-09-19T15:14:42Z) - ALIP: Adaptive Language-Image Pre-training with Synthetic Caption [78.93535202851278]
Contrastive Language-Image Pre-training (CLIP) has significantly boosted the performance of various vision-language tasks.
The presence of intrinsic noise and unmatched image-text pairs in web data can potentially affect the performance of representation learning.
We propose an Adaptive Language-Image Pre-training (ALIP), a bi-path model that integrates supervision from both raw text and synthetic caption.
arXiv Detail & Related papers (2023-08-16T15:19:52Z) - Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models [0.8878802873945023]
This study introduces the first systematic study on transferring Vision-Language Models to 2D medical images.
Although VLSMs show competitive performance compared to image-only models for segmentation, not all VLSMs utilize the additional information from language prompts.
arXiv Detail & Related papers (2023-08-15T11:28:21Z) - LOTUS: Learning to Optimize Task-based US representations [39.81131738128329]
Anatomical segmentation of organs in ultrasound images is essential to many clinical applications.
Existing deep neural networks require a large amount of labeled data for training in order to achieve clinically acceptable performance.
In this paper, we propose a novel approach for learning to optimize task-based ultra-sound image representations.
arXiv Detail & Related papers (2023-07-29T16:29:39Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - Mixed-UNet: Refined Class Activation Mapping for Weakly-Supervised
Semantic Segmentation with Multi-scale Inference [28.409679398886304]
We develop a novel model named Mixed-UNet, which has two parallel branches in the decoding phase.
We evaluate the designed Mixed-UNet against several prevalent deep learning-based segmentation approaches on our dataset collected from the local hospital and public datasets.
arXiv Detail & Related papers (2022-05-06T08:37:02Z) - Voice-assisted Image Labelling for Endoscopic Ultrasound Classification
using Neural Networks [48.732863591145964]
We propose a multi-modal convolutional neural network architecture that labels endoscopic ultrasound (EUS) images from raw verbal comments provided by a clinician during the procedure.
Our results show a prediction accuracy of 76% at image level on a dataset with 5 different labels.
arXiv Detail & Related papers (2021-10-12T21:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.