Related papers: Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models

Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models

URL: http://arxiv.org/abs/2404.04936v1
Date: Sun, 7 Apr 2024 12:17:40 GMT
Title: Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models
Authors: Weiwei Cao, Jianpeng Zhang, Yingda Xia, Tony C. W. Mok, Zi Li, Xianghua Ye, Le Lu, Jian Zheng, Yuxing Tang, Ling Zhang,
Abstract summary: We explore the feasibility of leveraging language as a naturally high-quality supervision for chest CT imaging. We bootstrap the understanding of 3D chest CT images by distilling chest-related diagnostic knowledge from an extensively pre-trained 2D X-ray expert model. We train our model with over 12,000 pairs of chest CT images and radiology reports.
Score: 17.75505740079875
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Radiologists highly desire fully automated versatile AI for medical imaging interpretation. However, the lack of extensively annotated large-scale multi-disease datasets has hindered the achievement of this goal. In this paper, we explore the feasibility of leveraging language as a naturally high-quality supervision for chest CT imaging. In light of the limited availability of image-report pairs, we bootstrap the understanding of 3D chest CT images by distilling chest-related diagnostic knowledge from an extensively pre-trained 2D X-ray expert model. Specifically, we propose a language-guided retrieval method to match each 3D CT image with its semantically closest 2D X-ray image, and perform pair-wise and semantic relation knowledge distillation. Subsequently, we use contrastive learning to align images and reports within the same patient while distinguishing them from the other patients. However, the challenge arises when patients have similar semantic diagnoses, such as healthy patients, potentially confusing if treated as negatives. We introduce a robust contrastive learning that identifies and corrects these false negatives. We train our model with over 12,000 pairs of chest CT images and radiology reports. Extensive experiments across multiple scenarios, including zero-shot learning, report generation, and fine-tuning processes, demonstrate the model's feasibility in interpreting chest CT images.

Related papers

RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining [64.66825253356869]
We propose a novel methodology that leverages dense radiology reports to define image-wise similarity ordering at multiple granularities.<n>We construct two comprehensive medical imaging retrieval datasets: MIMIC-IR for Chest X-rays and CTRATE-IR for CT scans.<n>We develop two retrieval systems, RadIR-CXR and model-ChestCT, which demonstrate superior performance in traditional image-image and image-report retrieval tasks.
arXiv Detail & Related papers (2025-03-06T17:43:03Z)
3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans. Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z)
X-ray2CTPA: Generating 3D CTPA scans from 2D X-ray conditioning [24.233484690096898]
Chest X-rays or chest radiography (CXR) enables limited imaging compared to computed tomography (CT) scans. CT scans entail higher costs, greater radiation exposure, and are less accessible than CXRs. In this work we explore cross-modal translation from a 2D low contrast-resolution X-ray input to a 3D high contrast and spatial-resolutionA scan.
arXiv Detail & Related papers (2024-06-23T13:53:35Z)
CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios [53.94122089629544]
We introduce CT-GLIP (Grounded Language-Image Pretraining with CT scans), a novel method that constructs organ-level image-text pairs to enhance multimodal contrastive learning. Our method, trained on a multimodal CT dataset comprising 44,011 organ-level vision-text pairs from 17,702 patients across 104 organs, demonstrates it can identify organs and abnormalities in a zero-shot manner using natural languages.
arXiv Detail & Related papers (2024-04-23T17:59:01Z)
Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography [1.8424705673580284]
We introduce CT-RATE, the first dataset that pairs 3D medical images with corresponding textual reports. We develop CT-CLIP, a CT-focused contrastive language-image pretraining framework. We create CT-CHAT, a vision-language foundational chat model for 3D chest CT volumes.
arXiv Detail & Related papers (2024-03-26T16:19:56Z)
Improving Computed Tomography (CT) Reconstruction via 3D Shape Induction [3.1498833540989413]
We propose shape induction, that is, learning the shape of 3D CT from X-ray without CT supervision, as a novel technique to incorporate realistic X-ray distributions during training of a reconstruction model. Our experiments demonstrate that this process improves both the perceptual quality of generated CT and the accuracy of down-stream classification of pulmonary infectious diseases.
arXiv Detail & Related papers (2022-08-23T13:06:02Z)
Improving Tuberculosis (TB) Prediction using Synthetically Generated Computed Tomography (CT) Images [0.17499351967216337]
Pulmonary infections can often be best imaged and evaluated through computed tomography (CT) scans. X-ray, a different type of imaging procedure, is inexpensive, often available at the bedside and more widely available, but offers a simpler, two dimensional image. We show that by relying on a model that learns to generate CT images from X-rays synthetically, we can improve the automatic disease classification accuracy.
arXiv Detail & Related papers (2021-09-23T16:35:15Z)
A Multi-Stage Attentive Transfer Learning Framework for Improving COVID-19 Diagnosis [49.3704402041314]
We propose a multi-stage attentive transfer learning framework for improving COVID-19 diagnosis. Our proposed framework consists of three stages to train accurate diagnosis models through learning knowledge from multiple source tasks and data of different domains. Importantly, we propose a novel self-supervised learning method to learn multi-scale representations for lung CT images.
arXiv Detail & Related papers (2021-01-14T01:39:19Z)
XraySyn: Realistic View Synthesis From a Single Radiograph Through CT Priors [118.27130593216096]
A radiograph visualizes the internal anatomy of a patient through the use of X-ray, which projects 3D information onto a 2D plane. To the best of our knowledge, this is the first work on radiograph view synthesis. We show that by gaining an understanding of radiography in 3D space, our method can be applied to radiograph bone extraction and suppression without groundtruth bone labels.
arXiv Detail & Related papers (2020-12-04T05:08:53Z)
Convolutional-LSTM for Multi-Image to Single Output Medical Prediction [55.41644538483948]
A common scenario in developing countries is to have the volume metadata lost due multiple reasons. It is possible to get a multi-image to single diagnostic model which mimics human doctor diagnostic process.
arXiv Detail & Related papers (2020-10-20T04:30:09Z)
XRayGAN: Consistency-preserving Generation of X-ray Images from Radiology Reports [19.360283053558604]
We develop methods to generate view-consistent, high-fidelity, and high-resolution X-ray images from radiology reports. This work represents the first one generating consistent and high-resolution X-ray images from radiology reports.
arXiv Detail & Related papers (2020-06-17T05:32:14Z)
Synergistic Learning of Lung Lobe Segmentation and Hierarchical Multi-Instance Classification for Automated Severity Assessment of COVID-19 in CT Images [61.862364277007934]
We propose a synergistic learning framework for automated severity assessment of COVID-19 in 3D CT images. A multi-task deep network (called M$2$UNet) is then developed to assess the severity of COVID-19 patients. Our M$2$UNet consists of a patch-level encoder, a segmentation sub-network for lung lobe segmentation, and a classification sub-network for severity assessment.
arXiv Detail & Related papers (2020-05-08T03:16:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.