MvKeTR: Chest CT Report Generation with Multi-View Perception and Knowledge Enhancement
- URL: http://arxiv.org/abs/2411.18309v2
- Date: Mon, 06 Jan 2025 10:34:37 GMT
- Title: MvKeTR: Chest CT Report Generation with Multi-View Perception and Knowledge Enhancement
- Authors: Xiwei Deng, Xianchun He, Jiangfeng Bao, Yudan Zhou, Shuhui Cai, Congbo Cai, Zhong Chen,
- Abstract summary: Multi-view perception knowledge-enhanced Transformer (MvKeTR)
MVPA with view-aware attention effectively synthesizes diagnostic information from multiple anatomical views.
Cross-Modal Knowledge Enhancer (CMKE) retrieves the most similar reports based on the query volume.
- Score: 1.4680538148112467
- License:
- Abstract: CT report generation (CTRG) aims to automatically generate diagnostic reports for 3D volumes, relieving clinicians' workload and improving patient care. Despite clinical value, existing works fail to effectively incorporate diagnostic information from multiple anatomical views and lack related clinical expertise essential for accurate and reliable diagnosis. To resolve these limitations, we propose a novel Multi-view perception Knowledge-enhanced Transformer (MvKeTR) to mimic the diagnostic workflow of clinicians. Just as radiologists first examine CT scans from multiple planes, a Multi-View Perception Aggregator (MVPA) with view-aware attention effectively synthesizes diagnostic information from multiple anatomical views. Then, inspired by how radiologists further refer to relevant clinical records to guide diagnostic decision-making, a Cross-Modal Knowledge Enhancer (CMKE) retrieves the most similar reports based on the query volume to incorporate domain knowledge into the diagnosis procedure. Furthermore, instead of traditional MLPs, we employ Kolmogorov-Arnold Networks (KANs) with learnable nonlinear activation functions as the fundamental building blocks of both modules to better capture intricate diagnostic patterns in CT interpretation. Extensive experiments on the public CTRG-Chest-548K dataset demonstrate that our method outpaces prior state-of-the-art (SOTA) models across almost all metrics. The code will be made publicly available.
Related papers
- Bridging the Diagnostic Divide: Classical Computer Vision and Advanced AI methods for distinguishing ITB and CD through CTE Scans [2.900410045439515]
A consensus among radiologists has recognized the visceral-to-subcutaneous fat ratio as a surrogate biomarker for differentiating between ITB and CD.
We propose a novel 2D image computer vision algorithm for auto-segmenting subcutaneous fat to automate this ratio calculation.
We trained a ResNet10 model on a dataset of CTE scans with samples from ITB, CD, and normal patients, achieving an accuracy of 75%.
arXiv Detail & Related papers (2024-10-23T17:05:27Z) - CopilotCAD: Empowering Radiologists with Report Completion Models and Quantitative Evidence from Medical Image Foundation Models [3.8940162151291804]
This study introduces an innovative paradigm to create an assistive co-pilot system for empowering radiologists.
We develop a collaborative framework to integrate Large Language Models (LLMs) and medical image analysis tools.
arXiv Detail & Related papers (2024-04-11T01:33:45Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Dynamic Multi-Domain Knowledge Networks for Chest X-ray Report
Generation [0.5939858158928474]
We propose a Dynamic Multi-Domain Knowledge(DMDK) network for radiology diagnostic report generation.
The DMDK network consists of four modules: Chest Feature Extractor(CFE), Dynamic Knowledge Extractor(DKE), Specific Knowledge Extractor(SKE), and Multi-knowledge Integrator(MKI) module.
We performed extensive experiments on two widely used datasets, IU X-Ray and MIMIC-CXR.
arXiv Detail & Related papers (2023-10-08T11:20:02Z) - An Empirical Analysis for Zero-Shot Multi-Label Classification on
COVID-19 CT Scans and Uncurated Reports [0.5527944417831603]
pandemic resulted in vast repositories of unstructured data, including radiology reports, due to increased medical examinations.
Previous research on automated diagnosis of COVID-19 primarily focuses on X-ray images, despite their lower precision compared to computed tomography (CT) scans.
In this work, we leverage unstructured data from a hospital and harness the fine-grained details offered by CT scans to perform zero-shot multi-label classification based on contrastive visual language learning.
arXiv Detail & Related papers (2023-09-04T17:58:01Z) - A Transformer-based representation-learning model with unified
processing of multimodal input for clinical diagnostics [63.106382317917344]
We report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner.
The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases.
arXiv Detail & Related papers (2023-06-01T16:23:47Z) - Improving Chest X-Ray Classification by RNN-based Patient Monitoring [0.34998703934432673]
We analyze how information about diagnosis can improve CNN-based image classification models.
We show that a model trained on additional patient history information outperforms a model trained without the information by a significant margin.
arXiv Detail & Related papers (2022-10-28T11:47:15Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - BI-RADS-Net: An Explainable Multitask Learning Approach for Cancer
Diagnosis in Breast Ultrasound Images [69.41441138140895]
This paper introduces BI-RADS-Net, a novel explainable deep learning approach for cancer detection in breast ultrasound images.
The proposed approach incorporates tasks for explaining and classifying breast tumors, by learning feature representations relevant to clinical diagnosis.
Explanations of the predictions (benign or malignant) are provided in terms of morphological features that are used by clinicians for diagnosis and reporting in medical practice.
arXiv Detail & Related papers (2021-10-05T19:14:46Z) - Inheritance-guided Hierarchical Assignment for Clinical Automatic
Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.