MedRAT: Unpaired Medical Report Generation via Auxiliary Tasks
- URL: http://arxiv.org/abs/2407.03919v2
- Date: Mon, 22 Jul 2024 07:49:34 GMT
- Title: MedRAT: Unpaired Medical Report Generation via Auxiliary Tasks
- Authors: Elad Hirsch, Gefen Dawidowicz, Ayellet Tal,
- Abstract summary: We propose a novel model that leverages the available information in two distinct datasets.
Our model, named MedRAT, surpasses previous state-of-the-art methods.
- Score: 11.190146577567548
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Medical report generation from X-ray images is a challenging task, particularly in an unpaired setting where paired image-report data is unavailable for training. To address this challenge, we propose a novel model that leverages the available information in two distinct datasets, one comprising reports and the other consisting of images. The core idea of our model revolves around the notion that combining auto-encoding report generation with multi-modal (report-image) alignment can offer a solution. However, the challenge persists regarding how to achieve this alignment when pair correspondence is absent. Our proposed solution involves the use of auxiliary tasks, particularly contrastive learning and classification, to position related images and reports in close proximity to each other. This approach differs from previous methods that rely on pre-processing steps, such as using external information stored in a knowledge graph. Our model, named MedRAT, surpasses previous state-of-the-art methods, demonstrating the feasibility of generating comprehensive medical reports without the need for paired data or external tools.
Related papers
- PairAug: What Can Augmented Image-Text Pairs Do for Radiology? [23.042820473327303]
Current vision-language pre-training methodologies predominantly depend on paired image-text datasets.
We propose a framework capable of concurrently augmenting medical image and text data.
arXiv Detail & Related papers (2024-04-07T13:40:29Z) - MedCycle: Unpaired Medical Report Generation via Cycle-Consistency [11.190146577567548]
We introduce an innovative approach that eliminates the need for consistent labeling schemas.
This approach is based on cycle-consistent mapping functions that transform image embeddings into report embeddings.
It outperforms state-of-the-art results in unpaired chest X-ray report generation, demonstrating improvements in both language and clinical metrics.
arXiv Detail & Related papers (2024-03-20T09:40:11Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Style-Aware Radiology Report Generation with RadGraph and Few-Shot
Prompting [5.596515201054671]
We propose a two-step approach for radiology report generation.
First, we extract the content from an image; then, we verbalize the extracted content into a report that matches the style of a specific radiologist.
arXiv Detail & Related papers (2023-10-26T23:06:38Z) - Automatic Radiology Report Generation by Learning with Increasingly Hard
Negatives [23.670280341513795]
This paper proposes a novel framework to learn discriminative image and report features.
It distinguishes them from their closest peers, i.e., hard negatives.
It can serve as a plug-in to readily improve existing medical report generation models.
arXiv Detail & Related papers (2023-05-11T23:12:13Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - A Medical Semantic-Assisted Transformer for Radiographic Report
Generation [39.99216295697047]
We propose a memory-augmented sparse attention block to capture the higher-order interactions between the input fine-grained image features.
We also introduce a novel Medical Concepts Generation Network (MCGN) to predict fine-grained semantic concepts and incorporate them into the report generation process as guidance.
arXiv Detail & Related papers (2022-08-22T14:38:19Z) - Variational Topic Inference for Chest X-Ray Report Generation [102.04931207504173]
Report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice.
Recent work has shown that deep learning models can successfully caption natural images.
We propose variational topic inference for automatic report generation.
arXiv Detail & Related papers (2021-07-15T13:34:38Z) - Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing.
In this paper, we develop a novel generative method named generative adversarial U-Net.
Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.