Knowledge Boosting: Rethinking Medical Contrastive Vision-Language
Pre-Training
- URL: http://arxiv.org/abs/2307.07246v2
- Date: Mon, 17 Jul 2023 15:02:26 GMT
- Title: Knowledge Boosting: Rethinking Medical Contrastive Vision-Language
Pre-Training
- Authors: Xiaofei Chen, Yuting He, Cheng Xue, Rongjun Ge, Shuo Li, Guanyu Yang
- Abstract summary: We propose the Knowledge-Boosting Contrastive Vision-Language Pre-training framework (KoBo)
KoBo integrates clinical knowledge into the learning of vision-language semantic consistency.
Experiments validate the effect of our framework on eight tasks including classification, segmentation, retrieval, and semantic relatedness.
- Score: 6.582001681307021
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The foundation models based on pre-training technology have significantly
advanced artificial intelligence from theoretical to practical applications.
These models have facilitated the feasibility of computer-aided diagnosis for
widespread use. Medical contrastive vision-language pre-training, which does
not require human annotations, is an effective approach for guiding
representation learning using description information in diagnostic reports.
However, the effectiveness of pre-training is limited by the large-scale
semantic overlap and shifting problems in medical field. To address these
issues, we propose the Knowledge-Boosting Contrastive Vision-Language
Pre-training framework (KoBo), which integrates clinical knowledge into the
learning of vision-language semantic consistency. The framework uses an
unbiased, open-set sample-wise knowledge representation to measure negative
sample noise and supplement the correspondence between vision-language mutual
information and clinical knowledge. Extensive experiments validate the effect
of our framework on eight tasks including classification, segmentation,
retrieval, and semantic relatedness, achieving comparable or better performance
with the zero-shot or few-shot settings. Our code is open on
https://github.com/ChenXiaoFei-CS/KoBo.
Related papers
- Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization [4.634780391920529]
We propose a novel explainable prompt learning framework that leverages medical knowledge by aligning the semantics of images, learnable prompts, and clinical concept-driven prompts.
Our framework addresses the lack of valuable concept annotations by eliciting knowledge from large language models.
Our method simultaneously achieves superior diagnostic performance, flexibility, and interpretability, shedding light on the effectiveness of foundation models in facilitating XAI.
arXiv Detail & Related papers (2024-03-14T14:02:01Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement [9.347971487478038]
This paper develops a novel vision-language representation learning framework by proposing a key semantic knowledge-emphasized report refinement method.
Our framework surpasses seven state-of-the-art methods in both fine-tuning and zero-shot settings.
arXiv Detail & Related papers (2024-01-21T07:57:04Z) - Representing visual classification as a linear combination of words [0.0]
We present an explainability strategy that uses a vision-language model to identify language-based descriptors of a visual classification task.
By leveraging a pre-trained joint embedding space between images and text, our approach estimates a new classification task as a linear combination of words.
We find that the resulting descriptors largely align with clinical knowledge despite a lack of domain-specific language training.
arXiv Detail & Related papers (2023-11-18T02:00:20Z) - Align, Reason and Learn: Enhancing Medical Vision-and-Language
Pre-training with Knowledge [68.90835997085557]
We propose a systematic and effective approach to enhance structured medical knowledge from three perspectives.
First, we align the representations of the vision encoder and the language encoder through knowledge.
Second, we inject knowledge into the multi-modal fusion model to enable the model to perform reasoning using knowledge as the supplementation of the input image and text.
Third, we guide the model to put emphasis on the most critical information in images and texts by designing knowledge-induced pretext tasks.
arXiv Detail & Related papers (2022-09-15T08:00:01Z) - Leveraging Visual Knowledge in Language Tasks: An Empirical Study on
Intermediate Pre-training for Cross-modal Knowledge Transfer [61.34424171458634]
We study whether integrating visual knowledge into a language model can fill the gap.
Our experiments show that visual knowledge transfer can improve performance in both low-resource and fully supervised settings.
arXiv Detail & Related papers (2022-03-14T22:02:40Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - Reasoning over Vision and Language: Exploring the Benefits of
Supplemental Knowledge [59.87823082513752]
This paper investigates the injection of knowledge from general-purpose knowledge bases (KBs) into vision-and-language transformers.
We empirically study the relevance of various KBs to multiple tasks and benchmarks.
The technique is model-agnostic and can expand the applicability of any vision-and-language transformer with minimal computational overhead.
arXiv Detail & Related papers (2021-01-15T08:37:55Z) - A Practical Approach towards Causality Mining in Clinical Text using
Active Transfer Learning [2.6125458645126907]
Causality mining is an active research area, which requires the application of state-of-the-art natural language processing techniques.
This research work is to create a framework, which can convert clinical text into causal knowledge.
arXiv Detail & Related papers (2020-12-10T06:51:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.