Enhancing medical vision-language contrastive learning via
inter-matching relation modelling
- URL: http://arxiv.org/abs/2401.10501v1
- Date: Fri, 19 Jan 2024 05:28:51 GMT
- Title: Enhancing medical vision-language contrastive learning via
inter-matching relation modelling
- Authors: Mingjian Li, Mingyuan Meng, Michael Fulham, David Dagan Feng, Lei Bi,
Jinman Kim
- Abstract summary: Medical image representations can be learned through medical vision-language contrastive learning (mVLCL)
Recent mVLCL methods attempt to align image sub-regions and the report keywords as local-matchings.
We propose a mVLCL method that models the inter-matching relations between local-matchings via a relation-enhanced contrastive learning framework (RECLF)
- Score: 14.777259981193726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical image representations can be learned through medical vision-language
contrastive learning (mVLCL) where medical imaging reports are used as weak
supervision through image-text alignment. These learned image representations
can be transferred to and benefit various downstream medical vision tasks such
as disease classification and segmentation. Recent mVLCL methods attempt to
align image sub-regions and the report keywords as local-matchings. However,
these methods aggregate all local-matchings via simple pooling operations while
ignoring the inherent relations between them. These methods therefore fail to
reason between local-matchings that are semantically related, e.g.,
local-matchings that correspond to the disease word and the location word
(semantic-relations), and also fail to differentiate such clinically important
local-matchings from others that correspond to less meaningful words, e.g.,
conjunction words (importance-relations). Hence, we propose a mVLCL method that
models the inter-matching relations between local-matchings via a
relation-enhanced contrastive learning framework (RECLF). In RECLF, we
introduce a semantic-relation reasoning module (SRM) and an importance-relation
reasoning module (IRM) to enable more fine-grained report supervision for image
representation learning. We evaluated our method using four public benchmark
datasets on four downstream tasks, including segmentation, zero-shot
classification, supervised classification, and cross-modal retrieval. Our
results demonstrated the superiority of our RECLF over the state-of-the-art
mVLCL methods with consistent improvements across single-modal and cross-modal
tasks. These results suggest that our RECLF, by modelling the inter-matching
relations, can learn improved medical image representations with better
generalization capabilities.
Related papers
- Cross-model Mutual Learning for Exemplar-based Medical Image Segmentation [25.874281336821685]
Cross-model Mutual learning framework for Exemplar-based Medical image (CMEMS)
We introduce a novel Cross-model Mutual learning framework for Exemplar-based Medical image (CMEMS)
arXiv Detail & Related papers (2024-04-18T00:18:07Z) - PRIOR: Prototype Representation Joint Learning from Medical Images and
Reports [19.336988866061294]
We present a prototype representation learning framework incorporating both global and local alignment between medical images and reports.
In contrast to standard global multi-modality alignment methods, we employ a local alignment module for fine-grained representation.
A sentence-wise prototype memory bank is constructed, enabling the network to focus on low-level localized visual and high-level clinical linguistic features.
arXiv Detail & Related papers (2023-07-24T07:49:01Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance.
Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas.
We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - Multi-Granularity Cross-modal Alignment for Generalized Medical Visual
Representation Learning [24.215619918283462]
We present a novel framework for learning medical visual representations directly from paired radiology reports.
Our framework harnesses the naturally exhibited semantic correspondences between medical image and radiology reports at three different levels.
arXiv Detail & Related papers (2022-10-12T09:31:39Z) - Cross-level Contrastive Learning and Consistency Constraint for
Semi-supervised Medical Image Segmentation [46.678279106837294]
We propose a cross-level constrastive learning scheme to enhance representation capacity for local features in semi-supervised medical image segmentation.
With the help of the cross-level contrastive learning and consistency constraint, the unlabelled data can be effectively explored to improve segmentation performance.
arXiv Detail & Related papers (2022-02-08T15:12:11Z) - Unsupervised domain adaptation for cross-modality liver segmentation via
joint adversarial learning and self-learning [2.309675169959214]
Liver segmentation on images acquired using computed tomography (CT) and magnetic resonance imaging (MRI) plays an important role in clinical management of liver diseases.
In this work, we report a novel unsupervised domain adaptation framework for cross-modality liver segmentation via joint adversarial learning and self-learning.
arXiv Detail & Related papers (2021-09-13T01:46:28Z) - Deep Relational Metric Learning [84.95793654872399]
This paper presents a deep relational metric learning framework for image clustering and retrieval.
We learn an ensemble of features that characterizes an image from different aspects to model both interclass and intraclass distributions.
Experiments on the widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate that our framework improves existing deep metric learning methods and achieves very competitive results.
arXiv Detail & Related papers (2021-08-23T09:31:18Z) - Learning Relation Alignment for Calibrated Cross-modal Retrieval [52.760541762871505]
We propose a novel metric, Intra-modal Self-attention Distance (ISD), to quantify the relation consistency by measuring the semantic distance between linguistic and visual relations.
We present Inter-modal Alignment on Intra-modal Self-attentions (IAIS), a regularized training method to optimize the ISD and calibrate intra-modal self-attentions mutually via inter-modal alignment.
arXiv Detail & Related papers (2021-05-28T14:25:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.