On the Importance of Image Encoding in Automated Chest X-Ray Report
Generation
- URL: http://arxiv.org/abs/2211.13465v1
- Date: Thu, 24 Nov 2022 08:02:52 GMT
- Title: On the Importance of Image Encoding in Automated Chest X-Ray Report
Generation
- Authors: Otabek Nazarov, Mohammad Yaqub, Karthik Nandakumar
- Abstract summary: Chest X-ray is one of the most popular medical imaging modalities due to its accessibility and effectiveness.
There is a chronic shortage of well-trained radiologists who can interpret these images and diagnose the patient's condition.
automated radiology report generation can be a very helpful tool in clinical practice.
- Score: 4.843654097048771
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Chest X-ray is one of the most popular medical imaging modalities due to its
accessibility and effectiveness. However, there is a chronic shortage of
well-trained radiologists who can interpret these images and diagnose the
patient's condition. Therefore, automated radiology report generation can be a
very helpful tool in clinical practice. A typical report generation workflow
consists of two main steps: (i) encoding the image into a latent space and (ii)
generating the text of the report based on the latent image embedding. Many
existing report generation techniques use a standard convolutional neural
network (CNN) architecture for image encoding followed by a Transformer-based
decoder for medical text generation. In most cases, CNN and the decoder are
trained jointly in an end-to-end fashion. In this work, we primarily focus on
understanding the relative importance of encoder and decoder components.
Towards this end, we analyze four different image encoding approaches: direct,
fine-grained, CLIP-based, and Cluster-CLIP-based encodings in conjunction with
three different decoders on the large-scale MIMIC-CXR dataset. Among these
encoders, the cluster CLIP visual encoder is a novel approach that aims to
generate more discriminative and explainable representations. CLIP-based
encoders produce comparable results to traditional CNN-based encoders in terms
of NLP metrics, while fine-grained encoding outperforms all other encoders both
in terms of NLP and clinical accuracy metrics, thereby validating the
importance of image encoder to effectively extract semantic information. GitHub
repository: https://github.com/mudabek/encoding-cxr-report-gen
Related papers
- BEFUnet: A Hybrid CNN-Transformer Architecture for Precise Medical Image
Segmentation [0.0]
This paper proposes an innovative U-shaped network called BEFUnet, which enhances the fusion of body and edge information for precise medical image segmentation.
The BEFUnet comprises three main modules, including a novel Local Cross-Attention Feature (LCAF) fusion module, a novel Double-Level Fusion (DLF) module, and dual-branch encoder.
The LCAF module efficiently fuses edge and body features by selectively performing local cross-attention on features that are spatially close between the two modalities.
arXiv Detail & Related papers (2024-02-13T21:03:36Z) - Bidirectional Captioning for Clinically Accurate and Interpretable
Models [4.355562946859011]
Vision-language pretraining has been shown to produce high-quality visual encoders which transfer efficiently to downstream computer vision tasks.
In this paper, we experiment with bidirectional captioning of radiology reports as a form of pretraining and compare the quality and utility of learned embeddings with those from contrastive pretraining methods.
Results show that not only does captioning pretraining yield visual encoders that are competitive with contrastive pretraining (CheXpert competition multi-label AUC of 89.4%), but also that our transformer decoder is capable of generating clinically relevant reports.
arXiv Detail & Related papers (2023-10-30T15:25:29Z) - Dilated-UNet: A Fast and Accurate Medical Image Segmentation Approach
using a Dilated Transformer and U-Net Architecture [0.6445605125467572]
This paper introduces Dilated-UNet, which combines a Dilated Transformer block with the U-Net architecture for accurate and fast medical image segmentation.
The results of our experiments show that Dilated-UNet outperforms other models on several challenging medical image segmentation datasets.
arXiv Detail & Related papers (2023-04-22T17:20:13Z) - Adversarial Neural Networks for Error Correcting Codes [76.70040964453638]
We introduce a general framework to boost the performance and applicability of machine learning (ML) models.
We propose to combine ML decoders with a competing discriminator network that tries to distinguish between codewords and noisy words.
Our framework is game-theoretic, motivated by generative adversarial networks (GANs)
arXiv Detail & Related papers (2021-12-21T19:14:44Z) - Small Lesion Segmentation in Brain MRIs with Subpixel Embedding [105.1223735549524]
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues.
We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network.
arXiv Detail & Related papers (2021-09-18T00:21:17Z) - Dynamic Neural Representational Decoders for High-Resolution Semantic
Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD)
As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks.
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - Atrous Residual Interconnected Encoder to Attention Decoder Framework
for Vertebrae Segmentation via 3D Volumetric CT Images [1.8146155083014204]
This paper proposes a novel algorithm for automated vertebrae segmentation via 3D volumetric spine CT images.
The proposed model is based on the structure of encoder to decoder, using layer normalization to optimize mini-batch training performance.
The experimental results show that our model achieves competitive performance compared with other state-of-the-art medical semantic segmentation methods.
arXiv Detail & Related papers (2021-04-08T12:09:16Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.