MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation
- URL: http://arxiv.org/abs/2410.02458v2
- Date: Fri, 4 Oct 2024 14:19:33 GMT
- Title: MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation
- Authors: Gurucharan Marthi Krishna Kumar, Aman Chadha, Janine Mendola, Amir Shmuel,
- Abstract summary: This study explores enhancing Vision Transformers (ViTs) for medical image segmentation by integrating pre-trained LLM transformer blocks.
Our approach, which incorporates a frozen LLM transformer block into the encoder of a ViT-based model, leads to substantial improvements in segmentation performance.
The enhanced model shows significant performance gains, including an average Dice score increase from 0.74 to 0.79 and improvements in accuracy, precision, and the Jaccard Index.
- Score: 0.8437187555622164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs), known for their versatility in textual data, are increasingly being explored for their potential to enhance medical image segmentation, a crucial task for accurate diagnostic imaging. This study explores enhancing Vision Transformers (ViTs) for medical image segmentation by integrating pre-trained LLM transformer blocks. Our approach, which incorporates a frozen LLM transformer block into the encoder of a ViT-based model, leads to substantial improvements in segmentation performance across various medical imaging modalities. We propose a Hybrid Attention Mechanism that combines global and local feature learning with a Multi-Scale Fusion Block for aggregating features across different scales. The enhanced model shows significant performance gains, including an average Dice score increase from 0.74 to 0.79 and improvements in accuracy, precision, and the Jaccard Index. These results demonstrate the effectiveness of LLM-based transformers in refining medical image segmentation, highlighting their potential to significantly boost model accuracy and robustness. The source code and our implementation are available at: https://bit.ly/3zf2CVs
Related papers
- Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical
Image Segmentation [0.0]
We propose a simple yet effective UNet-Transformer (seUNet-Trans) model for medical image segmentation.
In our approach, the UNet model is designed as a feature extractor to generate multiple feature maps from the input images.
By leveraging the UNet architecture and the self-attention mechanism, our model not only retains the preservation of both local and global context information but also is capable of capturing long-range dependencies between input elements.
arXiv Detail & Related papers (2023-10-16T01:13:38Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - Multi-scale Hierarchical Vision Transformer with Cascaded Attention
Decoding for Medical Image Segmentation [8.530680502975095]
We introduce a Multi-scale hiERarchical vIsion Transformer (MERIT) backbone network, which improves the generalizability of the model by computing SA at multiple scales.
We also incorporate an attention-based decoder, namely Cascaded Attention Decoding (CASCADE), for further refinement of multi-stage features generated by MERIT.
arXiv Detail & Related papers (2023-03-29T17:58:40Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - Class-Aware Generative Adversarial Transformers for Medical Image
Segmentation [39.14169989603906]
We present CA-GANformer, a novel type of generative adversarial transformers, for medical image segmentation.
First, we take advantage of the pyramid structure to construct multi-scale representations and handle multi-scale variations.
We then design a novel class-aware transformer module to better learn the discriminative regions of objects with semantic structures.
arXiv Detail & Related papers (2022-01-26T03:50:02Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z) - Interpretable and synergistic deep learning for visual explanation and
statistical estimations of segmentation of disease features from medical
images [0.0]
Deep learning (DL) models for disease classification or segmentation from medical images are increasingly trained using transfer learning (TL) from unrelated natural world images.
We report detailed comparisons, rigorous statistical analysis and comparisons of widely used DL architecture for binary segmentation after TL.
A free GitHub repository of TII and LMI models, code and more than 10,000 medical images and their Grad-CAM output from this study can be used as starting points for advanced computational medicine.
arXiv Detail & Related papers (2020-11-11T14:08:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.