Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement
- URL: http://arxiv.org/abs/2401.11421v2
- Date: Wed, 4 Sep 2024 15:01:15 GMT
- Title: Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement
- Authors: Weijian Huang, Cheng Li, Hao Yang, Jiarun Liu, Yong Liang, Hairong Zheng, Shanshan Wang,
- Abstract summary: This paper develops a novel vision-language representation learning framework by proposing a key semantic knowledge-emphasized report refinement method.
Our framework surpasses seven state-of-the-art methods in both fine-tuning and zero-shot settings.
- Score: 9.347971487478038
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, vision-language representation learning has made remarkable advancements in building up medical foundation models, holding immense potential for transforming the landscape of clinical research and medical care. The underlying hypothesis is that the rich knowledge embedded in radiology reports can effectively assist and guide the learning process, reducing the need for additional labels. However, these reports tend to be complex and sometimes even consist of redundant descriptions that make the representation learning too challenging to capture the key semantic information. This paper develops a novel iterative vision-language representation learning framework by proposing a key semantic knowledge-emphasized report refinement method. Particularly, raw radiology reports are refined to highlight the key information according to a constructed clinical dictionary and two model-optimized knowledge-enhancement metrics. The iterative framework is designed to progressively learn, starting from gaining a general understanding of the patient's condition based on raw reports and gradually refines and extracts critical information essential to the fine-grained analysis tasks. The effectiveness of the proposed framework is validated on various downstream medical image analysis tasks, including disease classification, region-of-interest segmentation, and phrase grounding. Our framework surpasses seven state-of-the-art methods in both fine-tuning and zero-shot settings, demonstrating its encouraging potential for different clinical applications.
Related papers
- Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.
Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - RadBARTsum: Domain Specific Adaption of Denoising Sequence-to-Sequence Models for Abstractive Radiology Report Summarization [1.8450534779202723]
This study proposes RadBARTsum, a domain-specific and facilitated adaptation of the BART model for abstractive radiology report summarization.
The approach involves two main steps: 1) re-training the BART model on a large corpus of radiology reports using a novel entity masking strategy to improve biomedical domain knowledge learning, and 2) fine-tuning the model for the summarization task using the Findings and Background sections to predict the Impression section.
arXiv Detail & Related papers (2024-06-05T08:43:11Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Knowledge Boosting: Rethinking Medical Contrastive Vision-Language
Pre-Training [6.582001681307021]
We propose the Knowledge-Boosting Contrastive Vision-Language Pre-training framework (KoBo)
KoBo integrates clinical knowledge into the learning of vision-language semantic consistency.
Experiments validate the effect of our framework on eight tasks including classification, segmentation, retrieval, and semantic relatedness.
arXiv Detail & Related papers (2023-07-14T09:38:22Z) - Align, Reason and Learn: Enhancing Medical Vision-and-Language
Pre-training with Knowledge [68.90835997085557]
We propose a systematic and effective approach to enhance structured medical knowledge from three perspectives.
First, we align the representations of the vision encoder and the language encoder through knowledge.
Second, we inject knowledge into the multi-modal fusion model to enable the model to perform reasoning using knowledge as the supplementation of the input image and text.
Third, we guide the model to put emphasis on the most critical information in images and texts by designing knowledge-induced pretext tasks.
arXiv Detail & Related papers (2022-09-15T08:00:01Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - Making the Most of Text Semantics to Improve Biomedical Vision--Language
Processing [17.96645738679543]
We show that textual semantic modelling can substantially improve contrastive learning in self-supervised vision--language processing.
We propose a self-supervised joint vision--language approach with a focus on better text modelling.
arXiv Detail & Related papers (2022-04-21T00:04:35Z) - Radiology Report Generation with a Learned Knowledge Base and
Multi-modal Alignment [27.111857943935725]
We present an automatic, multi-modal approach for report generation from chest x-ray.
Our approach features two distinct modules: (i) Learned knowledge base and (ii) Multi-modal alignment.
With the aid of both modules, our approach clearly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-12-30T10:43:56Z) - A Practical Approach towards Causality Mining in Clinical Text using
Active Transfer Learning [2.6125458645126907]
Causality mining is an active research area, which requires the application of state-of-the-art natural language processing techniques.
This research work is to create a framework, which can convert clinical text into causal knowledge.
arXiv Detail & Related papers (2020-12-10T06:51:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.