A Transformer-based representation-learning model with unified
processing of multimodal input for clinical diagnostics
- URL: http://arxiv.org/abs/2306.00864v1
- Date: Thu, 1 Jun 2023 16:23:47 GMT
- Title: A Transformer-based representation-learning model with unified
processing of multimodal input for clinical diagnostics
- Authors: Hong-Yu Zhou, Yizhou Yu, Chengdi Wang, Shu Zhang, Yuanxu Gao, Jia Pan,
Jun Shao, Guangming Lu, Kang Zhang, Weimin Li
- Abstract summary: We report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner.
The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases.
- Score: 63.106382317917344
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: During the diagnostic process, clinicians leverage multimodal information,
such as chief complaints, medical images, and laboratory-test results.
Deep-learning models for aiding diagnosis have yet to meet this requirement.
Here we report a Transformer-based representation-learning model as a clinical
diagnostic aid that processes multimodal input in a unified manner. Rather than
learning modality-specific features, the model uses embedding layers to convert
images and unstructured and structured text into visual tokens and text tokens,
and bidirectional blocks with intramodal and intermodal attention to learn a
holistic representation of radiographs, the unstructured chief complaint and
clinical history, structured clinical information such as laboratory-test
results and patient demographic information. The unified model outperformed an
image-only model and non-unified multimodal diagnosis models in the
identification of pulmonary diseases (by 12% and 9%, respectively) and in the
prediction of adverse clinical outcomes in patients with COVID-19 (by 29% and
7%, respectively). Leveraging unified multimodal Transformer-based models may
help streamline triage of patients and facilitate the clinical decision
process.
Related papers
- PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology [7.87900104748629]
We have meticulously compiled a dataset of approximately 45,000 cases, covering over 6 different tasks.
We have fine-tuned multimodal large models, specifically LLaVA, Qwen-VL, InternLM, with this dataset to enhance instruction-based performance.
arXiv Detail & Related papers (2024-08-13T17:05:06Z) - MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models [11.798375238713488]
MEDFuse is a framework that integrates structured and unstructured medical data.
It achieves over 90% F1 score in the 10-disease multi-label classification task.
arXiv Detail & Related papers (2024-07-17T04:17:09Z) - Integrating Medical Imaging and Clinical Reports Using Multimodal Deep Learning for Advanced Disease Analysis [3.8758525789991896]
An innovative multi-modal deep learning model is proposed to deeply integrate heterogeneous information from medical images and clinical reports.
For medical images, convolutional neural networks were used to extract high-dimensional features and capture key visual information.
For clinical report text, a two-way long and short-term memory network combined with an attention mechanism is used for deep semantic understanding.
arXiv Detail & Related papers (2024-05-23T02:22:10Z) - A Clinical-oriented Multi-level Contrastive Learning Method for Disease Diagnosis in Low-quality Medical Images [4.576524795036682]
Disease diagnosis methods guided by contrastive learning (CL) have shown significant advantages in lesion feature representation.
We propose a clinical-oriented multi-level CL framework that aims to enhance the model's capacity to extract lesion features.
The proposed CL framework is validated on two public medical image datasets, EyeQ and Chest X-ray.
arXiv Detail & Related papers (2024-04-07T09:08:14Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Pixel-Level Explanation of Multiple Instance Learning Models in
Biomedical Single Cell Images [52.527733226555206]
We investigate the use of four attribution methods to explain a multiple instance learning models.
We study two datasets of acute myeloid leukemia with over 100 000 single cell images.
We compare attribution maps with the annotations of a medical expert to see how the model's decision-making differs from the human standard.
arXiv Detail & Related papers (2023-03-15T14:00:11Z) - This Patient Looks Like That Patient: Prototypical Networks for
Interpretable Diagnosis Prediction from Clinical Text [56.32427751440426]
In clinical practice such models must not only be accurate, but provide doctors with interpretable and helpful results.
We introduce ProtoPatient, a novel method based on prototypical networks and label-wise attention.
We evaluate the model on two publicly available clinical datasets and show that it outperforms existing baselines.
arXiv Detail & Related papers (2022-10-16T10:12:07Z) - MMLN: Leveraging Domain Knowledge for Multimodal Diagnosis [10.133715767542386]
We propose a knowledge-driven and data-driven framework for lung disease diagnosis.
We formulate diagnosis rules according to authoritative clinical medicine guidelines and learn the weights of rules from text data.
A multimodal fusion consisting of text and image data is designed to infer the marginal probability of lung disease.
arXiv Detail & Related papers (2022-02-09T04:12:30Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.