Enhancing Representation in Medical Vision-Language Foundation Models
via Multi-Scale Information Extraction Techniques
- URL: http://arxiv.org/abs/2401.01583v2
- Date: Mon, 26 Feb 2024 10:35:03 GMT
- Title: Enhancing Representation in Medical Vision-Language Foundation Models
via Multi-Scale Information Extraction Techniques
- Authors: Weijian Huang, Cheng Li, Hong-Yu Zhou, Jiarun Liu, Hao Yang, Yong
Liang, Guangming Shi, Hairong Zheng, Shanshan Wang
- Abstract summary: We propose a method that effectively exploits multi-scale information to enhance the performance of medical foundation models.
We evaluate the effectiveness of the proposed method on six open-source datasets across different clinical tasks.
- Score: 41.078761802053535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The development of medical vision-language foundation models has attracted
significant attention in the field of medicine and healthcare due to their
promising prospect in various clinical applications. While previous studies
have commonly focused on feature learning at a single learning scale,
investigation on integrating multi-scale information is lacking, which may
hinder the potential for mutual reinforcement among these features. This paper
aims to bridge this gap by proposing a method that effectively exploits
multi-scale information to enhance the performance of medical foundation
models. The proposed method simultaneously exploits features at the local,
instance, modality and global aspects, facilitating comprehensive
representation learning within the models. We evaluate the effectiveness of the
proposed method on six open-source datasets across different clinical tasks,
demonstrating its ability to enhance the performance of medical foundation
models.
Related papers
- Exploration of Attention Mechanism-Enhanced Deep Learning Models in the Mining of Medical Textual Data [3.22071437711162]
The research explores the utilization of a deep learning model employing an attention mechanism in medical text mining.
It aims to enhance the model's capability to identify essential medical information by incorporating deep learning and attention mechanisms.
arXiv Detail & Related papers (2024-05-23T00:20:14Z) - OpenMEDLab: An Open-source Platform for Multi-modality Foundation Models
in Medicine [55.29668193415034]
We present OpenMEDLab, an open-source platform for multi-modality foundation models.
It encapsulates solutions of pioneering attempts in prompting and fine-tuning large language and vision models for frontline clinical and bioinformatic applications.
It opens access to a group of pre-trained foundation models for various medical image modalities, clinical text, protein engineering, etc.
arXiv Detail & Related papers (2024-02-28T03:51:02Z) - Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary
Task Integration [54.76511683427566]
This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information.
A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction.
The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures.
arXiv Detail & Related papers (2024-02-16T05:16:20Z) - Review of multimodal machine learning approaches in healthcare [0.0]
Clinicians rely on a variety of data sources to make informed decisions.
Recent advances in machine learning have facilitated the more efficient incorporation of multimodal data.
arXiv Detail & Related papers (2024-02-04T12:21:38Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Multimodal Machine Learning in Image-Based and Clinical Biomedicine:
Survey and Prospects [2.1070612998322438]
The paper explores the transformative potential of multimodal models for clinical predictions.
Despite advancements, challenges such as data biases and the scarcity of "big data" in many biomedical domains persist.
arXiv Detail & Related papers (2023-11-04T05:42:51Z) - Foundational Models in Medical Imaging: A Comprehensive Survey and
Future Vision [6.2847894163744105]
Foundation models are large-scale, pre-trained deep-learning models adapted to a wide range of downstream tasks.
These models facilitate contextual reasoning, generalization, and prompt capabilities at test time.
Capitalizing on the advances in computer vision, medical imaging has also marked a growing interest in these models.
arXiv Detail & Related papers (2023-10-28T12:08:12Z) - Enhancing Representation in Radiography-Reports Foundation Model: A
Granular Alignment Algorithm Using Masked Contrastive Learning [8.717599327516822]
MaCo is a novel multi-modal medical foundation model that explores masked contrastive learning to achieve granular alignment and zero-shot learning for a variety of medical imaging tasks.
We evaluate MaCo on six well-known open-source X-ray datasets, and the experimental results show it outperforms seven state-of-the-art approaches for classification, segmentation, and zero-shot phase grounding.
arXiv Detail & Related papers (2023-09-12T01:29:37Z) - Understanding the Tricks of Deep Learning in Medical Image Segmentation:
Challenges and Future Directions [66.40971096248946]
In this paper, we collect a series of MedISeg tricks for different model implementation phases.
We experimentally explore the effectiveness of these tricks on consistent baselines.
We also open-sourced a strong MedISeg repository, where each component has the advantage of plug-and-play.
arXiv Detail & Related papers (2022-09-21T12:30:05Z) - Align, Reason and Learn: Enhancing Medical Vision-and-Language
Pre-training with Knowledge [68.90835997085557]
We propose a systematic and effective approach to enhance structured medical knowledge from three perspectives.
First, we align the representations of the vision encoder and the language encoder through knowledge.
Second, we inject knowledge into the multi-modal fusion model to enable the model to perform reasoning using knowledge as the supplementation of the input image and text.
Third, we guide the model to put emphasis on the most critical information in images and texts by designing knowledge-induced pretext tasks.
arXiv Detail & Related papers (2022-09-15T08:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.