A Simple Framework Uniting Visual In-context Learning with Masked Image
Modeling to Improve Ultrasound Segmentation
- URL: http://arxiv.org/abs/2402.14300v3
- Date: Fri, 8 Mar 2024 05:48:41 GMT
- Title: A Simple Framework Uniting Visual In-context Learning with Masked Image
Modeling to Improve Ultrasound Segmentation
- Authors: Yuyue Zhou, Banafshe Felfeliyan, Shrimanti Ghosh, Jessica Knight,
Fatima Alves-Pereira, Christopher Keen, Jessica K\"upper, Abhilash
Rakkunedeth Hareendranathan, Jacob L. Jaremko
- Abstract summary: Visual in-context learning (ICL) is a new and exciting area of research in computer vision.
We propose a new simple visual ICL method called SimICL, combining visual ICL pairing images with masked image modeling (MIM) designed for self-supervised learning.
SimICL achieved a remarkably high Dice coeffient (DC) of 0.96 and Jaccard Index (IoU) of 0.92, surpassing state-of-the-art segmentation and visual ICL models.
- Score: 0.6223528900192875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional deep learning models deal with images one-by-one, requiring
costly and time-consuming expert labeling in the field of medical imaging, and
domain-specific restriction limits model generalizability. Visual in-context
learning (ICL) is a new and exciting area of research in computer vision.
Unlike conventional deep learning, ICL emphasizes the model's ability to adapt
to new tasks based on given examples quickly. Inspired by MAE-VQGAN, we
proposed a new simple visual ICL method called SimICL, combining visual ICL
pairing images with masked image modeling (MIM) designed for self-supervised
learning. We validated our method on bony structures segmentation in a wrist
ultrasound (US) dataset with limited annotations, where the clinical objective
was to segment bony structures to help with further fracture detection. We used
a test set containing 3822 images from 18 patients for bony region
segmentation. SimICL achieved an remarkably high Dice coeffient (DC) of 0.96
and Jaccard Index (IoU) of 0.92, surpassing state-of-the-art segmentation and
visual ICL models (a maximum DC 0.86 and IoU 0.76), with SimICL DC and IoU
increasing up to 0.10 and 0.16. This remarkably high agreement with limited
manual annotations indicates SimICL could be used for training AI models even
on small US datasets. This could dramatically decrease the human expert time
required for image labeling compared to conventional approaches, and enhance
the real-world use of AI assistance in US image analysis.
Related papers
- A Unified Model for Compressed Sensing MRI Across Undersampling Patterns [69.19631302047569]
We propose a unified MRI reconstruction model robust to various measurement undersampling patterns and image resolutions.
Our model improves SSIM by 11% and PSNR by 4 dB over a state-of-the-art CNN (End-to-End VarNet) with 600$times$ faster inference than diffusion methods.
arXiv Detail & Related papers (2024-10-05T20:03:57Z) - MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation [2.2585213273821716]
We introduce MedCLIP-SAMv2, a novel framework that integrates the CLIP and SAM models to perform segmentation on clinical scans.
Our approach includes fine-tuning the BiomedCLIP model with a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss.
We also investigate using zero-shot segmentation labels within a weakly supervised paradigm to enhance segmentation quality further.
arXiv Detail & Related papers (2024-09-28T23:10:37Z) - Augmentation is AUtO-Net: Augmentation-Driven Contrastive Multiview
Learning for Medical Image Segmentation [3.1002416427168304]
This thesis focuses on retinal blood vessel segmentation tasks.
It provides an extensive literature review of deep learning-based medical image segmentation approaches.
It proposes a novel efficient, simple multiview learning framework.
arXiv Detail & Related papers (2023-11-02T06:31:08Z) - Multi-scale Multi-site Renal Microvascular Structures Segmentation for
Whole Slide Imaging in Renal Pathology [4.743463035587953]
We present Omni-Seg, a novel single dynamic network method that capitalizes on multi-site, multi-scale training data.
We train a singular deep network using images from two datasets, HuBMAP and NEPTUNE.
Our proposed method provides renal pathologists with a powerful computational tool for the quantitative analysis of renal microvascular structures.
arXiv Detail & Related papers (2023-08-10T16:26:03Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Deep AUC Maximization for Medical Image Classification: Challenges and
Opportunities [60.079782224958414]
We will present and discuss opportunities and challenges brought by a new deep learning method by AUC (aka underlinebf Deep underlinebf AUC classification)
arXiv Detail & Related papers (2021-11-01T15:31:32Z) - Few-shot Medical Image Segmentation using a Global Correlation Network
with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation.
We construct our few-shot image segmentor using a deep convolutional network trained episodically.
We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z) - Interpretable and synergistic deep learning for visual explanation and
statistical estimations of segmentation of disease features from medical
images [0.0]
Deep learning (DL) models for disease classification or segmentation from medical images are increasingly trained using transfer learning (TL) from unrelated natural world images.
We report detailed comparisons, rigorous statistical analysis and comparisons of widely used DL architecture for binary segmentation after TL.
A free GitHub repository of TII and LMI models, code and more than 10,000 medical images and their Grad-CAM output from this study can be used as starting points for advanced computational medicine.
arXiv Detail & Related papers (2020-11-11T14:08:17Z) - Segmentation of Cellular Patterns in Confocal Images of Melanocytic
Lesions in vivo via a Multiscale Encoder-Decoder Network (MED-Net) [2.0487455621441377]
"Multiscale-Decoder Network (MED-Net)" provides pixel-wise labeling into classes of patterns in a quantitative manner.
We trained and tested our model on non-overlapping partitions of 117 reflectance confocal microscopy (RCM) mosaics of melanocytic lesions.
arXiv Detail & Related papers (2020-01-03T22:34:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.