IIHT: Medical Report Generation with Image-to-Indicator Hierarchical
Transformer
- URL: http://arxiv.org/abs/2308.05633v1
- Date: Thu, 10 Aug 2023 15:22:11 GMT
- Title: IIHT: Medical Report Generation with Image-to-Indicator Hierarchical
Transformer
- Authors: Keqiang Fan, Xiaohao Cai, Mahesan Niranjan
- Abstract summary: We propose an image-to-indicator hierarchical transformer (IIHT) framework for medical report generation.
The proposed IIHT method is feasible for radiologists to modify disease indicators in real-world scenarios.
- Score: 4.376565880192482
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated medical report generation has become increasingly important in
medical analysis. It can produce computer-aided diagnosis descriptions and thus
significantly alleviate the doctors' work. Inspired by the huge success of
neural machine translation and image captioning, various deep learning methods
have been proposed for medical report generation. However, due to the inherent
properties of medical data, including data imbalance and the length and
correlation between report sequences, the generated reports by existing methods
may exhibit linguistic fluency but lack adequate clinical accuracy. In this
work, we propose an image-to-indicator hierarchical transformer (IIHT)
framework for medical report generation. It consists of three modules, i.e., a
classifier module, an indicator expansion module and a generator module. The
classifier module first extracts image features from the input medical images
and produces disease-related indicators with their corresponding states. The
disease-related indicators are subsequently utilised as input for the indicator
expansion module, incorporating the "data-text-data" strategy. The
transformer-based generator then leverages these extracted features along with
image features as auxiliary information to generate final reports. Furthermore,
the proposed IIHT method is feasible for radiologists to modify disease
indicators in real-world scenarios and integrate the operations into the
indicator expansion module for fluent and accurate medical report generation.
Extensive experiments and comparisons with state-of-the-art methods under
various evaluation metrics demonstrate the great performance of the proposed
method.
Related papers
- VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics [0.0]
Visual attribution in medical imaging seeks to make evident the diagnostically-relevant components of a medical image.
We here present a novel generative visual attribution technique, one that leverages latent diffusion models in combination with domain-specific large language models.
The resulting system also exhibits a range of latent capabilities including zero-shot localized disease induction.
arXiv Detail & Related papers (2024-01-02T19:51:49Z) - Medical Report Generation based on Segment-Enhanced Contrastive
Representation Learning [39.17345313432545]
We propose MSCL (Medical image with Contrastive Learning) to segment organs, abnormalities, bones, etc.
We introduce a supervised contrastive loss that assigns more weight to reports that are semantically similar to the target while training.
Experimental results demonstrate the effectiveness of our proposed model, where we achieve state-of-the-art performance on the IU X-Ray public dataset.
arXiv Detail & Related papers (2023-12-26T03:33:48Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - KiUT: Knowledge-injected U-Transformer for Radiology Report Generation [10.139767157037829]
Radiology report generation aims to automatically generate a clinically accurate and coherent paragraph from the X-ray image.
We propose a Knowledge-injected U-Transformer (KiUT) to learn multi-level visual representation and adaptively distill the information.
arXiv Detail & Related papers (2023-06-20T07:27:28Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance.
Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas.
We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - AlignTransformer: Hierarchical Alignment of Visual Regions and Disease
Tags for Medical Report Generation [50.21065317817769]
We propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules.
Experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets.
arXiv Detail & Related papers (2022-03-18T13:43:53Z) - Radiology Report Generation with a Learned Knowledge Base and
Multi-modal Alignment [27.111857943935725]
We present an automatic, multi-modal approach for report generation from chest x-ray.
Our approach features two distinct modules: (i) Learned knowledge base and (ii) Multi-modal alignment.
With the aid of both modules, our approach clearly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-12-30T10:43:56Z) - Automated Generation of Accurate \& Fluent Medical X-ray Reports [17.927768992248172]
The paper focuses on automating the generation of medical reports from chest X-ray image inputs.
Our approach achieved promising results on commonly-used metrics concerning language fluency and clinical accuracy.
arXiv Detail & Related papers (2021-08-27T05:47:28Z) - Variational Topic Inference for Chest X-Ray Report Generation [102.04931207504173]
Report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice.
Recent work has shown that deep learning models can successfully caption natural images.
We propose variational topic inference for automatic report generation.
arXiv Detail & Related papers (2021-07-15T13:34:38Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.