Related papers: Understanding Transfer Learning for Chest Radiograph Clinical Report Generation with Modified Transformer Architectures

Understanding Transfer Learning for Chest Radiograph Clinical Report Generation with Modified Transformer Architectures

URL: http://arxiv.org/abs/2205.02841v1
Date: Thu, 5 May 2022 03:08:05 GMT
Title: Understanding Transfer Learning for Chest Radiograph Clinical Report Generation with Modified Transformer Architectures
Authors: Edward Vendrow, Ethan Schonfeld
Abstract summary: We train a series of modified transformers to generate clinical reports from chest radiograph image input. We use BLEU(1-4), ROUGE-L, CIDEr, and the clinical CheXbert F1 scores to validate our models and demonstrate competitive scores with state of the art models.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The image captioning task is increasingly prevalent in artificial intelligence applications for medicine. One important application is clinical report generation from chest radiographs. The clinical writing of unstructured reports is time consuming and error-prone. An automated system would improve standardization, error reduction, time consumption, and medical accessibility. In this paper we demonstrate the importance of domain specific pre-training and propose a modified transformer architecture for the medical image captioning task. To accomplish this, we train a series of modified transformers to generate clinical reports from chest radiograph image input. These modified transformers include: a meshed-memory augmented transformer architecture with visual extractor using ImageNet pre-trained weights, a meshed-memory augmented transformer architecture with visual extractor using CheXpert pre-trained weights, and a meshed-memory augmented transformer whose encoder is passed the concatenated embeddings using both ImageNet pre-trained weights and CheXpert pre-trained weights. We use BLEU(1-4), ROUGE-L, CIDEr, and the clinical CheXbert F1 scores to validate our models and demonstrate competitive scores with state of the art models. We provide evidence that ImageNet pre-training is ill-suited for the medical image captioning task, especially for less frequent conditions (eg: enlarged cardiomediastinum, lung lesion, pneumothorax). Furthermore, we demonstrate that the double feature model improves performance for specific medical conditions (edema, consolidation, pneumothorax, support devices) and overall CheXbert F1 score, and should be further developed in future work. Such a double feature model, including both ImageNet pre-training as well as domain specific pre-training, could be used in a wide range of image captioning models in medicine.

Related papers

Generative AI: A Pix2pix-GAN-Based Machine Learning Approach for Robust and Efficient Lung Segmentation [0.7614628596146602]
This study develops a deep learning framework using a Pix2pix Generative Adversarial Network (GAN) to segment pulmonary abnormalities from CXR images. The framework's image preprocessing and augmentation techniques were properly incorporated with a U-Net-inspired generator-discriminator architecture.
arXiv Detail & Related papers (2024-12-14T13:12:09Z)
Automatic Report Generation for Histopathology images using pre-trained Vision Transformers [1.2781698000674653]
We show that using an existing pre-trained Vision Transformer in a two-step process of first using it to encode 4096x4096 sized patches of the Whole Slide Image (WSI) and then using it as the encoder and an LSTM decoder for report generation. We are also able to use representations from an existing powerful pre-trained hierarchical vision transformer and show its usefulness in not just zero shot classification but also for report generation.
arXiv Detail & Related papers (2023-11-10T16:48:24Z)
Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images. We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations. The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z)
Customizing General-Purpose Foundation Models for Medical Report Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks. We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z)
A New Perspective to Boost Vision Transformer for Medical Image Classification [33.215289791017064]
We propose a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
arXiv Detail & Related papers (2023-01-03T07:45:59Z)
Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks. In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics. Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z)
Preservation of High Frequency Content for Deep Learning-Based Medical Image Classification [74.84221280249876]
An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists. We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information.
arXiv Detail & Related papers (2022-05-08T15:29:54Z)
Self Pre-training with Masked Autoencoders for Medical Image Classification and Segmentation [37.25161294917211]
Masked Autoencoder (MAE) has been shown to be effective in pre-training Vision Transformers (ViT) for natural image analysis. We investigate a self pre-training paradigm with MAE for medical image analysis tasks.
arXiv Detail & Related papers (2022-03-10T16:22:38Z)
Class-Aware Generative Adversarial Transformers for Medical Image Segmentation [39.14169989603906]
We present CA-GANformer, a novel type of generative adversarial transformers, for medical image segmentation. First, we take advantage of the pyramid structure to construct multi-scale representations and handle multi-scale variations. We then design a novel class-aware transformer module to better learn the discriminative regions of objects with semantic structures.
arXiv Detail & Related papers (2022-01-26T03:50:02Z)
Pre-training and Fine-tuning Transformers for fMRI Prediction Tasks [69.85819388753579]
TFF employs a transformer-based architecture and a two-phase training approach. Self-supervised training is applied to a collection of fMRI scans, where the model is trained for the reconstruction of 3D volume data. Results show state-of-the-art performance on a variety of fMRI tasks, including age and gender prediction, as well as schizophrenia recognition.
arXiv Detail & Related papers (2021-12-10T18:04:26Z)
Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays [10.398175542736285]
We introduce an image-text pre-training framework that can learn from mixed data inputs. We demonstrate the feasibility of pre-training across mixed data inputs. We also illustrate the benefits of adopting such pre-trained models in 3 chest X-ray applications.
arXiv Detail & Related papers (2021-03-30T01:48:46Z)
Medical Transformer: Gated Axial-Attention for Medical Image Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks. We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module. To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.