MUFASA: Multimodal Fusion Architecture Search for Electronic Health
Records
- URL: http://arxiv.org/abs/2102.02340v1
- Date: Wed, 3 Feb 2021 23:48:54 GMT
- Title: MUFASA: Multimodal Fusion Architecture Search for Electronic Health
Records
- Authors: Zhen Xu, David R. So, Andrew M. Dai
- Abstract summary: We extend state-of-the-art neural architecture search (NAS) methods and propose MUltimodal Fusion Architecture SeArch (MUFASA)
We demonstrate empirically that our MUFASA method outperforms established unimodal NAS on public EHR data with comparable costs.
Compared with these baselines on CCS diagnosis code prediction, our discovered models improve top-5 recall from 0.88 to 0.91 and demonstrate the ability to generalize to other EHR tasks.
- Score: 18.42914458055976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One important challenge of applying deep learning to electronic health
records (EHR) is the complexity of their multimodal structure. EHR usually
contains a mixture of structured (codes) and unstructured (free-text) data with
sparse and irregular longitudinal features -- all of which doctors utilize when
making decisions. In the deep learning regime, determining how different
modality representations should be fused together is a difficult problem, which
is often addressed by handcrafted modeling and intuition. In this work, we
extend state-of-the-art neural architecture search (NAS) methods and propose
MUltimodal Fusion Architecture SeArch (MUFASA) to simultaneously search across
multimodal fusion strategies and modality-specific architectures for the first
time. We demonstrate empirically that our MUFASA method outperforms established
unimodal NAS on public EHR data with comparable computation costs. In addition,
MUFASA produces architectures that outperform Transformer and Evolved
Transformer. Compared with these baselines on CCS diagnosis code prediction,
our discovered models improve top-5 recall from 0.88 to 0.91 and demonstrate
the ability to generalize to other EHR tasks. Studying our top architecture in
depth, we provide empirical evidence that MUFASA's improvements are derived
from its ability to both customize modeling for each data modality and find
effective fusion strategies.
Related papers
- POMONAG: Pareto-Optimal Many-Objective Neural Architecture Generator [4.09225917049674]
Transferable NAS has emerged, generalizing the search process from dataset-dependent to task-dependent.
This paper introduces POMONAG, extending DiffusionNAG via a many-optimal diffusion process.
Results were validated on two search spaces -- NAS201 and MobileNetV3 -- and evaluated across 15 image classification datasets.
arXiv Detail & Related papers (2024-09-30T16:05:29Z) - A Pairwise Comparison Relation-assisted Multi-objective Evolutionary Neural Architecture Search Method with Multi-population Mechanism [58.855741970337675]
Neural architecture search (NAS) enables re-searchers to automatically explore vast search spaces and find efficient neural networks.
NAS suffers from a key bottleneck, i.e., numerous architectures need to be evaluated during the search process.
We propose the SMEM-NAS, a pairwise com-parison relation-assisted multi-objective evolutionary algorithm based on a multi-population mechanism.
arXiv Detail & Related papers (2024-07-22T12:46:22Z) - Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Automated Fusion of Multimodal Electronic Health Records for Better
Medical Predictions [48.0590120095748]
We propose a novel neural architecture search (NAS) framework named AutoFM, which can automatically search for the optimal model architectures for encoding diverse input modalities and fusion strategies.
We conduct thorough experiments on real-world multi-modal EHR data and prediction tasks, and the results demonstrate that our framework achieves significant performance improvement over existing state-of-the-art methods.
arXiv Detail & Related papers (2024-01-20T15:14:14Z) - Two heads are better than one: Enhancing medical representations by
pre-training over structured and unstructured electronic health records [23.379185792773875]
We propose a unified deep learning-based medical pre-trained language model, named UMM-PLM, to automatically learn representative features from multimodal EHRs.
We first developed parallel unimodal information representation modules to capture the unimodal-specific characteristic, where unimodal representations were learned from each data source separately.
A cross-modal module was further introduced to model the interactions between different modalities.
arXiv Detail & Related papers (2022-01-25T06:14:49Z) - An Approach for Combining Multimodal Fusion and Neural Architecture
Search Applied to Knowledge Tracing [6.540879944736641]
We propose a sequential model based optimization approach that combines multimodal fusion and neural architecture search within one framework.
We evaluate our methods on two public real datasets showing the discovered model is able to achieve superior performance.
arXiv Detail & Related papers (2021-11-08T13:43:46Z) - One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search
Space Shrinking [97.60915598958968]
We propose a one-shot neural ensemble architecture search (NEAS) solution that addresses the two challenges.
For the first challenge, we introduce a novel diversity-based metric to guide search space shrinking.
For the second challenge, we enable a new search dimension to learn layer sharing among different models for efficiency purposes.
arXiv Detail & Related papers (2021-04-01T16:29:49Z) - CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared
Person Re-Identification [102.89434996930387]
VI-ReID aims to match cross-modality pedestrian images, breaking through the limitation of single-modality person ReID in dark environment.
Existing works manually design various two-stream architectures to separately learn modality-specific and modality-sharable representations.
We propose a novel method, named Cross-Modality Neural Architecture Search (CM-NAS)
arXiv Detail & Related papers (2021-01-21T07:07:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.