MvKeTR: Chest CT Report Generation with Multi-View Perception and Knowledge Enhancement
- URL: http://arxiv.org/abs/2411.18309v3
- Date: Thu, 26 Jun 2025 00:54:18 GMT
- Title: MvKeTR: Chest CT Report Generation with Multi-View Perception and Knowledge Enhancement
- Authors: Xiwei Deng, Xianchun He, Jianfeng Bao, Yudan Zhou, Shuhui Cai, Congbo Cai, Zhong Chen,
- Abstract summary: Multi-view perception knowledge-enhanced TansfoRmer (MvKeTR)<n>MVPA with view-aware attention is proposed to synthesize diagnostic information from multiple anatomical views effectively.<n>Cross-Modal Knowledge Enhancer (CMKE) is devised to retrieve the most similar reports based on the query volume.
- Score: 1.6355783973385114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: CT report generation (CTRG) aims to automatically generate diagnostic reports for 3D volumes, relieving clinicians' workload and improving patient care. Despite clinical value, existing works fail to effectively incorporate diagnostic information from multiple anatomical views and lack related clinical expertise essential for accurate and reliable diagnosis. To resolve these limitations, we propose a novel Multi-view perception Knowledge-enhanced TansfoRmer (MvKeTR) to mimic the diagnostic workflow of clinicians. Just as radiologists first examine CT scans from multiple planes, a Multi-View Perception Aggregator (MVPA) with view-aware attention is proposed to synthesize diagnostic information from multiple anatomical views effectively. Then, inspired by how radiologists further refer to relevant clinical records to guide diagnostic decision-making, a Cross-Modal Knowledge Enhancer (CMKE) is devised to retrieve the most similar reports based on the query volume to incorporate domain knowledge into the diagnosis procedure. Furthermore, instead of traditional MLPs, we employ Kolmogorov-Arnold Networks (KANs) as the fundamental building blocks of both modules, which exhibit superior parameter efficiency and reduced spectral bias to better capture high-frequency components critical for CT interpretation while mitigating overfitting. Extensive experiments on the public CTRG-Chest-548 K dataset demonstrate that our method outpaces prior state-of-the-art (SOTA) models across almost all metrics. The code is available at https://github.com/xiweideng/MvKeTR.
Related papers
- CT-GRAPH: Hierarchical Graph Attention Network for Anatomy-Guided CT Report Generation [4.376648893167674]
We propose CT-GRAPH, a hierarchical graph attention network that explicitly models radiological knowledge.<n>Our method leverages pretrained 3D medical feature encoders to obtain global and organ-level features.<n>We show that our method achieves a substantial improvement of absolute 7.9% in F1 score over current state-of-the-art methods.
arXiv Detail & Related papers (2025-08-07T13:18:03Z) - OrthoInsight: Rib Fracture Diagnosis and Report Generation Based on Multi-Modal Large Models [0.49478969093606673]
We propose OrthoInsight, a multi-modal deep learning framework for rib fracture diagnosis and report generation.<n>It integrates a YOLOv9 model for fracture detection, a medical knowledge graph for retrieving clinical context, and a fine-tuned LLaVA language model for generating diagnostic reports.<n> evaluated on 28,675 annotated CT images and expert reports, it achieves high performance across Diagnostic Accuracy, Content Completeness, Logical Coherence, and Clinical Guidance Value, with an average score of 4.28.
arXiv Detail & Related papers (2025-07-18T15:01:44Z) - Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography [0.0]
This study presents a comprehensive evaluation of radiomics-based and deep learning-based approaches for disease detection in chest radiography.
It focuses on COVID-19, lung opacity, and viral pneumonia.
The results aim to inform the integration of AI-driven diagnostic tools in clinical practice.
arXiv Detail & Related papers (2025-04-16T16:54:37Z) - Vision-Language Models for Acute Tuberculosis Diagnosis: A Multimodal Approach Combining Imaging and Clinical Data [0.0]
This study introduces a Vision-Language Model (VLM) leveraging SIGLIP and Gemma-3b architectures for automated acute tuberculosis (TB) screening.
The VLM combines visual data from chest X-rays with clinical context to generate detailed, context-aware diagnostic reports.
Key acute TB pathologies, including consolidation, cavities, and nodules, were detected with high precision and recall.
arXiv Detail & Related papers (2025-03-17T14:08:35Z) - A Continual Learning-driven Model for Accurate and Generalizable Segmentation of Clinically Comprehensive and Fine-grained Whole-body Anatomies in CT [67.34586036959793]
There is no fully annotated CT dataset with all anatomies delineated for training.
We propose a novel continual learning-driven CT model that can segment complete anatomies.
Our single unified CT segmentation model, CL-Net, can highly accurately segment a clinically comprehensive set of 235 fine-grained whole-body anatomies.
arXiv Detail & Related papers (2025-03-16T23:55:02Z) - Bridging the Diagnostic Divide: Classical Computer Vision and Advanced AI methods for distinguishing ITB and CD through CTE Scans [2.900410045439515]
A consensus among radiologists has recognized the visceral-to-subcutaneous fat ratio as a surrogate biomarker for differentiating between ITB and CD.
We propose a novel 2D image computer vision algorithm for auto-segmenting subcutaneous fat to automate this ratio calculation.
We trained a ResNet10 model on a dataset of CTE scans with samples from ITB, CD, and normal patients, achieving an accuracy of 75%.
arXiv Detail & Related papers (2024-10-23T17:05:27Z) - Large-scale Long-tailed Disease Diagnosis on Radiology Images [51.453990034460304]
RadDiag is a foundational model supporting 2D and 3D inputs across various modalities and anatomies.
Our dataset, RP3D-DiagDS, contains 40,936 cases with 195,010 scans covering 5,568 disorders.
arXiv Detail & Related papers (2023-12-26T18:20:48Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Dynamic Multi-Domain Knowledge Networks for Chest X-ray Report
Generation [0.5939858158928474]
We propose a Dynamic Multi-Domain Knowledge(DMDK) network for radiology diagnostic report generation.
The DMDK network consists of four modules: Chest Feature Extractor(CFE), Dynamic Knowledge Extractor(DKE), Specific Knowledge Extractor(SKE), and Multi-knowledge Integrator(MKI) module.
We performed extensive experiments on two widely used datasets, IU X-Ray and MIMIC-CXR.
arXiv Detail & Related papers (2023-10-08T11:20:02Z) - An Empirical Analysis for Zero-Shot Multi-Label Classification on
COVID-19 CT Scans and Uncurated Reports [0.5527944417831603]
pandemic resulted in vast repositories of unstructured data, including radiology reports, due to increased medical examinations.
Previous research on automated diagnosis of COVID-19 primarily focuses on X-ray images, despite their lower precision compared to computed tomography (CT) scans.
In this work, we leverage unstructured data from a hospital and harness the fine-grained details offered by CT scans to perform zero-shot multi-label classification based on contrastive visual language learning.
arXiv Detail & Related papers (2023-09-04T17:58:01Z) - A Transformer-based representation-learning model with unified
processing of multimodal input for clinical diagnostics [63.106382317917344]
We report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner.
The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases.
arXiv Detail & Related papers (2023-06-01T16:23:47Z) - Improving Chest X-Ray Classification by RNN-based Patient Monitoring [0.34998703934432673]
We analyze how information about diagnosis can improve CNN-based image classification models.
We show that a model trained on additional patient history information outperforms a model trained without the information by a significant margin.
arXiv Detail & Related papers (2022-10-28T11:47:15Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - Preservation of High Frequency Content for Deep Learning-Based Medical
Image Classification [74.84221280249876]
An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists.
We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information.
arXiv Detail & Related papers (2022-05-08T15:29:54Z) - SpineOne: A One-Stage Detection Framework for Degenerative Discs and
Vertebrae [54.751251046196494]
We propose a one-stage detection framework termed SpineOne to simultaneously localize and classify degenerative discs and vertebrae from MRI slices.
SpineOne is built upon the following three key techniques: 1) a new design of the keypoint heatmap to facilitate simultaneous keypoint localization and classification; 2) the use of attention modules to better differentiate the representations between discs and vertebrae; and 3) a novel gradient-guided objective association mechanism to associate multiple learning objectives at the later training stage.
arXiv Detail & Related papers (2021-10-28T12:59:06Z) - BI-RADS-Net: An Explainable Multitask Learning Approach for Cancer
Diagnosis in Breast Ultrasound Images [69.41441138140895]
This paper introduces BI-RADS-Net, a novel explainable deep learning approach for cancer detection in breast ultrasound images.
The proposed approach incorporates tasks for explaining and classifying breast tumors, by learning feature representations relevant to clinical diagnosis.
Explanations of the predictions (benign or malignant) are provided in terms of morphological features that are used by clinicians for diagnosis and reporting in medical practice.
arXiv Detail & Related papers (2021-10-05T19:14:46Z) - Inheritance-guided Hierarchical Assignment for Clinical Automatic
Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.