Medformer: A Multi-Granularity Patching Transformer for Medical Time-Series Classification
- URL: http://arxiv.org/abs/2405.19363v1
- Date: Fri, 24 May 2024 16:51:10 GMT
- Title: Medformer: A Multi-Granularity Patching Transformer for Medical Time-Series Classification
- Authors: Yihe Wang, Nan Huang, Taida Li, Yujun Yan, Xiang Zhang,
- Abstract summary: We introduce Medformer, a multi-granularity patching transformer tailored specifically for medical time series classification.
Our method incorporates three novel mechanisms to leverage the unique characteristics of medical time series.
- Score: 6.0233642055651115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical time series data, such as Electroencephalography (EEG) and Electrocardiography (ECG), play a crucial role in healthcare, such as diagnosing brain and heart diseases. Existing methods for medical time series classification primarily rely on handcrafted biomarkers extraction and CNN-based models, with limited exploration of transformers tailored for medical time series. In this paper, we introduce Medformer, a multi-granularity patching transformer tailored specifically for medical time series classification. Our method incorporates three novel mechanisms to leverage the unique characteristics of medical time series: cross-channel patching to leverage inter-channel correlations, multi-granularity embedding for capturing features at different scales, and two-stage (intra- and inter-granularity) multi-granularity self-attention for learning features and correlations within and among granularities. We conduct extensive experiments on five public datasets under both subject-dependent and challenging subject-independent setups. Results demonstrate Medformer's superiority over 10 baselines, achieving top averaged ranking across five datasets on all six evaluation metrics. These findings underscore the significant impact of our method on healthcare applications, such as diagnosing Myocardial Infarction, Alzheimer's, and Parkinson's disease. We release the source code at \url{https://github.com/DL4mHealth/Medformer}.
Related papers
- FedMedICL: Towards Holistic Evaluation of Distribution Shifts in Federated Medical Imaging [68.6715007665896]
FedMedICL is a unified framework and benchmark to holistically evaluate federated medical imaging challenges.
We comprehensively evaluate several popular methods on six diverse medical imaging datasets.
We find that a simple batch balancing technique surpasses advanced methods in average performance across FedMedICL experiments.
arXiv Detail & Related papers (2024-07-11T19:12:23Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training [99.2891802841936]
We introduce the Med-ST framework for fine-grained spatial and temporal modeling.
For spatial modeling, Med-ST employs the Mixture of View Expert (MoVE) architecture to integrate different visual features from both frontal and lateral views.
For temporal modeling, we propose a novel cross-modal bidirectional cycle consistency objective by forward mapping classification (FMC) and reverse mapping regression (RMR)
arXiv Detail & Related papers (2024-05-30T03:15:09Z) - Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision [1.6245786035158123]
This paper introduces a novel multimodal contrastive learning framework, specifically focusing on medical time series and clinical notes.
The framework integrates temporal cross-attention transformers with a dynamic embedding and tokenization scheme for learning multimodal feature representations.
Experiments with a real-world EHR dataset demonstrated that our framework outperformed state-of-the-art approaches on the exemplar task of predicting the occurrence of nine postoperative complications.
arXiv Detail & Related papers (2024-04-10T04:19:59Z) - HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling [4.44283662576491]
We present a novel framework based on hypernetworks to fuse clinical imaging and tabular data by conditioning the image processing on the EHR's values and measurements.
We show that our framework outperforms both single-modality models and state-of-the-art MRI-tabular data fusion methods.
arXiv Detail & Related papers (2024-03-20T05:50:04Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Heterogeneous Graph Learning for Multi-modal Medical Data Analysis [6.3082663934391014]
We propose an effective graph-based framework called HetMed for fusing the multi-modal medical data.
HetMed captures the complex relationship between patients in a systematic way, which leads to more accurate clinical decisions.
arXiv Detail & Related papers (2022-11-28T09:14:36Z) - Improving Medical Predictions by Irregular Multimodal Electronic Health
Records Modeling [19.346610191591143]
Health conditions among patients in intensive care units (ICUs) are monitored via electronic health records (EHRs)
Dealing with irregularity in each single modality and integrating it into multimodal representations to improve medical predictions is a challenging problem.
Our method first addresses irregularity in each single modality by dynamically incorporating hand-crafted imputation embeddings into learned embeddings via a gating mechanism.
We observe relative improvements of 6.5%, 3.6%, and 4.3% in F1 for time series, clinical notes, and multimodal fusion, respectively.
arXiv Detail & Related papers (2022-10-18T20:29:26Z) - Factored Attention and Embedding for Unstructured-view Topic-related
Ultrasound Report Generation [70.7778938191405]
We propose a novel factored attention and embedding model (termed FAE-Gen) for the unstructured-view topic-related ultrasound report generation.
The proposed FAE-Gen mainly consists of two modules, i.e., view-guided factored attention and topic-oriented factored embedding, which capture the homogeneous and heterogeneous morphological characteristic across different views.
arXiv Detail & Related papers (2022-03-12T15:24:03Z) - Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities.
This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.
We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.