A Foundation LAnguage-Image model of the Retina (FLAIR): Encoding expert
knowledge in text supervision
- URL: http://arxiv.org/abs/2308.07898v1
- Date: Tue, 15 Aug 2023 17:39:52 GMT
- Title: A Foundation LAnguage-Image model of the Retina (FLAIR): Encoding expert
knowledge in text supervision
- Authors: Julio Silva-Rodriguez, Hadi Chakor, Riadh Kobbi, Jose Dolz and Ismail
Ben Ayed
- Abstract summary: We present FLAIR, a pre-trained vision-language model for universal retinal fundus image understanding.
We compiled 37 open-access, mostly categorical fundus imaging datasets from various sources.
We integrate the expert's domain knowledge in the form of descriptive textual prompts, during both pre-training and zero-shot inference.
- Score: 17.583536041845402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation vision-language models are currently transforming computer vision,
and are on the rise in medical imaging fueled by their very promising
generalization capabilities. However, the initial attempts to transfer this new
paradigm to medical imaging have shown less impressive performances than those
observed in other domains, due to the significant domain shift and the complex,
expert domain knowledge inherent to medical-imaging tasks. Motivated by the
need for domain-expert foundation models, we present FLAIR, a pre-trained
vision-language model for universal retinal fundus image understanding. To this
end, we compiled 37 open-access, mostly categorical fundus imaging datasets
from various sources, with up to 97 different target conditions and 284,660
images. We integrate the expert's domain knowledge in the form of descriptive
textual prompts, during both pre-training and zero-shot inference, enhancing
the less-informative categorical supervision of the data. Such a textual
expert's knowledge, which we compiled from the relevant clinical literature and
community standards, describes the fine-grained features of the pathologies as
well as the hierarchies and dependencies between them. We report comprehensive
evaluations, which illustrate the benefit of integrating expert knowledge and
the strong generalization capabilities of FLAIR under difficult scenarios with
domain shifts or unseen categories. When adapted with a lightweight linear
probe, FLAIR outperforms fully-trained, dataset-focused models, more so in the
few-shot regimes. Interestingly, FLAIR outperforms by a large margin more
generalist, larger-scale image-language models, which emphasizes the potential
of embedding experts' domain knowledge and the limitations of generalist models
in medical imaging.
Related papers
- KA$^2$ER: Knowledge Adaptive Amalgamation of ExpeRts for Medical Images Segmentation [5.807887214293438]
We propose an adaptive amalgamation knowledge framework that aims to train a versatile foundation model to handle the joint goals of multiple expert models.
In particular, we first train an nnUNet-based expert model for each task, and reuse the pre-trained SwinUNTER as the target foundation model.
Within the hidden layer, the hierarchical attention mechanisms are designed to achieve adaptive merging of the target model to the hidden layer feature knowledge of all experts.
arXiv Detail & Related papers (2024-10-28T14:49:17Z) - LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models [59.961172635689664]
"Knowledge Decomposition" aims to improve the performance on specific medical tasks.
We propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD)
LoRKD explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution.
arXiv Detail & Related papers (2024-09-29T03:56:21Z) - UrFound: Towards Universal Retinal Foundation Models via Knowledge-Guided Masked Modeling [26.087595095138305]
UrFound is a retinal foundation model designed to learn universal representations from both multimodal retinal images and domain knowledge.
By training on 180k retinal images, UrFound significantly outperforms the state-of-the-art retinal foundation model trained on up to 1.6 million unlabelled images.
arXiv Detail & Related papers (2024-08-10T19:31:29Z) - A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis [48.84443450990355]
Deep networks have achieved broad success in analyzing natural images, when applied to medical scans, they often fail in unexcepted situations.
We investigate this challenge and focus on model sensitivity to domain shifts, such as data sampled from different hospitals or data confounded by demographic variables such as sex, race, etc, in the context of chest X-rays and skin lesion images.
Taking inspiration from medical training, we propose giving deep networks a prior grounded in explicit medical knowledge communicated in natural language.
arXiv Detail & Related papers (2024-05-23T17:55:02Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Artificial General Intelligence for Medical Imaging Analysis [92.3940918983821]
Large-scale Artificial General Intelligence (AGI) models have achieved unprecedented success in a variety of general domain tasks.
These models face notable challenges arising from the medical field's inherent complexities and unique characteristics.
This review aims to offer insights into the future implications of AGI in medical imaging, healthcare, and beyond.
arXiv Detail & Related papers (2023-06-08T18:04:13Z) - Adapting Pretrained Vision-Language Foundational Models to Medical
Imaging Domains [3.8137985834223502]
Building generative models for medical images that faithfully depict clinical context may help alleviate the paucity of healthcare datasets.
We explore the sub-components of the Stable Diffusion pipeline to fine-tune the model to generate medical images.
Our best-performing model improves upon the stable diffusion baseline and can be conditioned to insert a realistic-looking abnormality on a synthetic radiology image.
arXiv Detail & Related papers (2022-10-09T01:43:08Z) - Medical Image Understanding with Pretrained Vision Language Models: A
Comprehensive Study [8.547751745702156]
We show that well-designed medical prompts are the key to elicit knowledge from pre-trained vision language models (VLM)
We develop three approaches for automatic generation of medical prompts, which can inject expert-level medical knowledge and image-specific information into the prompts for fine-grained grounding.
arXiv Detail & Related papers (2022-09-30T15:06:13Z) - Few-shot Medical Image Segmentation using a Global Correlation Network
with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation.
We construct our few-shot image segmentor using a deep convolutional network trained episodically.
We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.