A Survey of Deep Learning-based Radiology Report Generation Using Multimodal Data
- URL: http://arxiv.org/abs/2405.12833v1
- Date: Tue, 21 May 2024 14:37:35 GMT
- Title: A Survey of Deep Learning-based Radiology Report Generation Using Multimodal Data
- Authors: Xinyi Wang, Grazziela Figueredo, Ruizhe Li, Wei Emma Zhang, Weitong Chen, Xin Chen,
- Abstract summary: Automatic radiology report generation can alleviate the workload for physicians and minimize regional disparities in medical resources.
It is a challenging task, as the computational model needs to mimic physicians to obtain information from multi-modal input data.
Recent works emerged to address this issue using deep learning-based methods, such as transformers, contrastive learning, and knowledge-base construction.
This survey summarizes the key techniques developed in the most recent works and proposes a general workflow for deep learning-based report generation.
- Score: 41.8344712915454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic radiology report generation can alleviate the workload for physicians and minimize regional disparities in medical resources, therefore becoming an important topic in the medical image analysis field. It is a challenging task, as the computational model needs to mimic physicians to obtain information from multi-modal input data (i.e., medical images, clinical information, medical knowledge, etc.), and produce comprehensive and accurate reports. Recently, numerous works emerged to address this issue using deep learning-based methods, such as transformers, contrastive learning, and knowledge-base construction. This survey summarizes the key techniques developed in the most recent works and proposes a general workflow for deep learning-based report generation with five main components, including multi-modality data acquisition, data preparation, feature learning, feature fusion/interaction, and report generation. The state-of-the-art methods for each of these components are highlighted. Additionally, training strategies, public datasets, evaluation methods, current challenges, and future directions in this field are summarized. We have also conducted a quantitative comparison between different methods under the same experimental setting. This is the most up-to-date survey that focuses on multi-modality inputs and data fusion for radiology report generation. The aim is to provide comprehensive and rich information for researchers interested in automatic clinical report generation and medical image analysis, especially when using multimodal inputs, and assist them in developing new algorithms to advance the field.
Related papers
- Large Language Model Benchmarks in Medical Tasks [11.196196955468992]
This paper presents a survey of various benchmark datasets employed in medical large language models (LLMs) tasks.
The survey categorizes the datasets by modality, discussing their significance, data structure, and impact on the development of LLMs.
The paper emphasizes the need for datasets with a greater degree of language diversity, structured omics data, and innovative approaches to synthesis.
arXiv Detail & Related papers (2024-10-28T11:07:33Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Review of multimodal machine learning approaches in healthcare [0.0]
Clinicians rely on a variety of data sources to make informed decisions.
Recent advances in machine learning have facilitated the more efficient incorporation of multimodal data.
arXiv Detail & Related papers (2024-02-04T12:21:38Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - Towards Medical Artificial General Intelligence via Knowledge-Enhanced
Multimodal Pretraining [121.89793208683625]
Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks.
We propose a new paradigm called Medical-knedge-enhanced mulTimOdal pretRaining (MOTOR)
arXiv Detail & Related papers (2023-04-26T01:26:19Z) - Multimodal Learning for Multi-Omics: A Survey [4.15790071124993]
Multimodal learning for integrative multi-omics analysis can help researchers and practitioners gain deep insights into human diseases.
However, several challenges are hindering the development in this area, including the availability of easily accessible open-source tools.
This survey aims to provide an up-to-date overview of the data challenges, fusion approaches, datasets, and software tools from several new perspectives.
arXiv Detail & Related papers (2022-11-29T12:08:06Z) - Multimodal Machine Learning in Precision Health [10.068890037410316]
This review was conducted to summarize this field and identify topics ripe for future research.
We used a combination of content analysis and literature searches to establish search strings and databases of PubMed, Google Scholar, and IEEEXplore from 2011 to 2021.
The most common form of information fusion was early fusion. Notably, there was an improvement in predictive performance performing heterogeneous data fusion.
arXiv Detail & Related papers (2022-04-10T21:56:07Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - A Survey on Incorporating Domain Knowledge into Deep Learning for
Medical Image Analysis [38.90186125141749]
Small size of medical datasets remains a major bottleneck in deep learning.
Traditional approaches leverage the information from natural images via transfer learning.
More recent works utilize the domain knowledge from medical doctors to create networks that resemble how medical doctors are trained.
arXiv Detail & Related papers (2020-04-25T14:27:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.