Related papers: Introducing Multimodal Paradigm for Learning Sleep Staging PSG via General-Purpose Model

Introducing Multimodal Paradigm for Learning Sleep Staging PSG via General-Purpose Model

URL: http://arxiv.org/abs/2509.22810v1
Date: Fri, 26 Sep 2025 18:14:43 GMT
Title: Introducing Multimodal Paradigm for Learning Sleep Staging PSG via General-Purpose Model
Authors: Jianheng Zhou, Chenyu Liu, Jinan Zhou, Yi Ding, Yang Liu, Haoran Luo, Ziyu Jia, Xinliang Zhou,
Abstract summary: Sleep staging is essential for diagnosing sleep disorders and assessing neurological health.<n>Existing automatic methods typically extract features from complex polysomnography (PSG) signals and train domain-specific models.<n>We introduce a new paradigm for sleep staging that leverages large multimodal general-purpose models to emulate clinical diagnostic practices.
Score: 25.949760386728354
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sleep staging is essential for diagnosing sleep disorders and assessing neurological health. Existing automatic methods typically extract features from complex polysomnography (PSG) signals and train domain-specific models, which often lack intuitiveness and require large, specialized datasets. To overcome these limitations, we introduce a new paradigm for sleep staging that leverages large multimodal general-purpose models to emulate clinical diagnostic practices. Specifically, we convert raw one-dimensional PSG time-series into intuitive two-dimensional waveform images and then fine-tune a multimodal large model to learn from these representations. Experiments on three public datasets (ISRUC, MASS, SHHS) demonstrate that our approach enables general-purpose models, without prior exposure to sleep data, to acquire robust staging capabilities. Moreover, explanation analysis reveals our model learned to mimic the visual diagnostic workflow of human experts for sleep staging by PSG images. The proposed method consistently outperforms state-of-the-art baselines in accuracy and robustness, highlighting its efficiency and practical value for medical applications. The code for the signal-to-image pipeline and the PSG image dataset will be released.

Related papers

Sleep Stage Classification using Multimodal Embedding Fusion from EOG and PSM [0.06282171844772422]
This study introduces a novel approach that leverages ImageBind, a multimodal embedding deep learning model, to integrate PSM data with dual-channel EOG signals for sleep stage classification.<n>Our results demonstrate that fine-tuning ImageBind significantly improves classification accuracy, outperforming existing models.
arXiv Detail & Related papers (2025-06-07T20:18:45Z)
Robust Polyp Detection and Diagnosis through Compositional Prompt-Guided Diffusion Models [32.17651741681871]
We propose a Progressive Spectrum Diffusion Model (PSDM) for generating synthetic polyp images.<n>PSDM integrates diverse clinical annotations-such as segmentation masks, bounding boxes, and colonoscopy reports-by transforming them into compositional prompts.<n>By augmenting training data with PSDM-generated samples, our model significantly improves polyp detection, classification, and segmentation.
arXiv Detail & Related papers (2025-02-25T08:22:45Z)
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation.<n>We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation.
arXiv Detail & Related papers (2024-12-30T01:59:34Z)
A Unified Model for Compressed Sensing MRI Across Undersampling Patterns [69.19631302047569]
We propose a unified MRI reconstruction model robust to various measurement undersampling patterns and image resolutions.<n>Our model improves SSIM by 11% and PSNR by 4 dB over a state-of-the-art CNN (End-to-End VarNet) with 600$times$ faster inference than diffusion methods.
arXiv Detail & Related papers (2024-10-05T20:03:57Z)
MSSC-BiMamba: Multimodal Sleep Stage Classification and Early Diagnosis of Sleep Disorders with Bidirectional Mamba [5.606144017978037]
We develop an automated model for sleep staging and disorder classification to enhance diagnostic accuracy and efficiency. Considering the characteristics of polysomnography (PSG) multi-lead sleep monitoring, we designed a multimodal sleep state classification model, MSSC-BiMamba. The model is the first to apply BiMamba to sleep staging with multimodal PSG data, showing substantial gains in computational and memory efficiency.
arXiv Detail & Related papers (2024-05-30T15:16:53Z)
Generative Medical Segmentation [5.4613210257624605]
Generative Medical (GMS) is a novel approach leveraging a generative model to perform image segmentation. GMS employs a robust pre-trained vision foundation model to extract latent representations for images and corresponding ground truth masks. The design of GMS leads to fewer trainable parameters in the model which reduces the risk of overfitting and enhances its capability.
arXiv Detail & Related papers (2024-03-27T02:16:04Z)
Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL) Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z)
Self-Supervised and Semi-Supervised Polyp Segmentation using Synthetic Data [16.356954231068077]
Early detection of colorectal polyps is of utmost importance for their treatment and for colorectal cancer prevention. Computer vision techniques have the potential to aid professionals in the diagnosis stage, where colonoscopies are manually carried out to examine the entirety of the patient's colon. The main challenge in medical imaging is the lack of data, and a further challenge specific to polyp segmentation approaches is the difficulty of manually labeling the available data. We propose an end-to-end model for polyp segmentation that integrates real and synthetic data to artificially increase the size of the datasets and aid the training when unlabeled samples are available.
arXiv Detail & Related papers (2023-07-22T09:57:58Z)
On Sensitivity and Robustness of Normalization Schemes to Input Distribution Shifts in Automatic MR Image Diagnosis [58.634791552376235]
Deep Learning (DL) models have achieved state-of-the-art performance in diagnosing multiple diseases using reconstructed images as input. DL models are sensitive to varying artifacts as it leads to changes in the input data distribution between the training and testing phases. We propose to use other normalization techniques, such as Group Normalization and Layer Normalization, to inject robustness into model performance against varying image artifacts.
arXiv Detail & Related papers (2023-06-23T03:09:03Z)
Modality Completion via Gaussian Process Prior Variational Autoencoders for Multi-Modal Glioma Segmentation [75.58395328700821]
We propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan. MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations. We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing.
arXiv Detail & Related papers (2021-07-07T19:06:34Z)
Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise. We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.