Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency
Department
- URL: http://arxiv.org/abs/2304.01233v1
- Date: Mon, 3 Apr 2023 06:32:00 GMT
- Title: Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency
Department
- Authors: Sabri Boughorbel, Fethi Jarray, Abdulaziz Al Homaid, Rashid Niaz,
Khalid Alyafei
- Abstract summary: We are interested in outcome prediction and patient triage in hospital emergency department based on text information in chief complaints and vital signs recorded at triage.
We adapt Perceiver - a modality-agnostic transformer-based model that has shown promising results in several applications.
In the experimental analysis, we show that mutli-modality improves the prediction performance compared with models trained solely on text or vital signs.
- Score: 0.03088120935391119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language modeling have shown impressive progress in generating compelling
text with good accuracy and high semantic coherence. An interesting research
direction is to augment these powerful models for specific applications using
contextual information. In this work, we explore multi-modal language modeling
for healthcare applications. We are interested in outcome prediction and
patient triage in hospital emergency department based on text information in
chief complaints and vital signs recorded at triage. We adapt Perceiver - a
modality-agnostic transformer-based model that has shown promising results in
several applications. Since vital-sign modality is represented in tabular
format, we modified Perceiver position encoding to ensure permutation
invariance. We evaluated the multi-modal language model for the task of
diagnosis code prediction using MIMIC-IV ED dataset on 120K visits. In the
experimental analysis, we show that mutli-modality improves the prediction
performance compared with models trained solely on text or vital signs. We
identified disease categories for which multi-modality leads to performance
improvement and show that for these categories, vital signs have added
predictive power. By analyzing the cross-attention layer, we show how
multi-modality contributes to model predictions. This work gives interesting
insights on the development of multi-modal language models for healthcare
applications.
Related papers
- LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model [55.80651780294357]
State-of-the-art medical multi-modal large language models (med-MLLM) leverage instruction-following data in pre-training.
LoGra-Med is a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions.
Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data.
arXiv Detail & Related papers (2024-10-03T15:52:03Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - Towards Holistic Disease Risk Prediction using Small Language Models [2.137491464843808]
We introduce a framework that connects small language models to multiple data sources, aiming to predict the risk of various diseases simultaneously.
Our experiments encompass 12 different tasks within a multitask learning setup.
arXiv Detail & Related papers (2024-08-13T15:01:33Z) - CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting [0.0]
We evaluate the publicly available, state of the art, foundational vision-language models for chest X-ray interpretation.
We find that vision-language models often hallucinate with confident language, which slows down clinical interpretation.
We develop an agent-based vision-language approach for report generation using CheXagent's linear probes and BioViL-T's phrase grounding tools.
arXiv Detail & Related papers (2024-07-11T18:39:19Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Multimodal Clinical Trial Outcome Prediction with Large Language Models [30.201189349890267]
We propose a multimodal mixture-of-experts (LIFTED) approach for clinical trial outcome prediction.
LIFTED unifies different modality data by transforming them into natural language descriptions.
Then, LIFTED constructs unified noise-resilient encoders to extract information from modal-specific language descriptions.
arXiv Detail & Related papers (2024-02-09T16:18:38Z) - An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians.
Recent studies have achieved promising results in automatic impression generation using large-scale medical text data.
These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z) - PheME: A deep ensemble framework for improving phenotype prediction from
multi-modal data [42.56953523499849]
We present PheME, an Ensemble framework using Multi-modality data of structured EHRs and unstructured clinical notes for accurate Phenotype prediction.
We leverage ensemble learning to combine outputs from single-modal models and multi-modal models to improve phenotype predictions.
arXiv Detail & Related papers (2023-03-19T23:41:04Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.