sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging
- URL: http://arxiv.org/abs/2501.16329v1
- Date: Mon, 27 Jan 2025 18:59:55 GMT
- Title: sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging
- Authors: Jingyuan Chen, Yuan Yao, Mie Anderson, Natalie Hauglund, Celia Kjaerby, Verena Untiet, Maiken Nedergaard, Jiebo Luo,
- Abstract summary: We propose a novel sleep stage scoring model sDREAMER.
We develop a mixture-of-modality-expert (MoME) model with three pathways for EEG, EMG, and mixed signals with partially shared weights.
Our model is trained with multi-channel inputs and can make classifications on either single-channel or multi-channel inputs.
- Score: 41.625261368270614
- License:
- Abstract: Automatic sleep staging based on electroencephalography (EEG) and electromyography (EMG) signals is an important aspect of sleep-related research. Current sleep staging methods suffer from two major drawbacks. First, there are limited information interactions between modalities in the existing methods. Second, current methods do not develop unified models that can handle different sources of input. To address these issues, we propose a novel sleep stage scoring model sDREAMER, which emphasizes cross-modality interaction and per-channel performance. Specifically, we develop a mixture-of-modality-expert (MoME) model with three pathways for EEG, EMG, and mixed signals with partially shared weights. We further propose a self-distillation training scheme for further information interaction across modalities. Our model is trained with multi-channel inputs and can make classifications on either single-channel or multi-channel inputs. Experiments demonstrate that our model outperforms the existing transformer-based sleep scoring methods for multi-channel inference. For single-channel inference, our model also outperforms the transformer-based models trained with single-channel signals.
Related papers
- wav2sleep: A Unified Multi-Modal Approach to Sleep Stage Classification from Physiological Signals [0.6261444979025643]
wav2sleep is a unified model designed to operate on variable sets of input signals during training and inference.
It outperforms existing sleep stage classification models across test-time input combinations including ECG, PPG, and respiratory signals.
arXiv Detail & Related papers (2024-11-07T12:01:36Z) - Automatic Classification of Sleep Stages from EEG Signals Using Riemannian Metrics and Transformer Networks [6.404789669795639]
In sleep medicine, assessing the evolution of a subject's sleep often involves the costly manual scoring of electroencephalographic (EEG) signals.
We present a novel way of integrating learned signal-wise features into said matrices without sacrificing their Symmetric Definite Positive (SPD) nature.
arXiv Detail & Related papers (2024-10-18T06:49:52Z) - A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation [15.29891397291197]
Given base diffusion models for audio and video, we integrate them with additional modules into a single model and train it to make the model jointly generate audio and video.
To enhance alignment between audio-video pairs, we introduce two novel mechanisms in our model.
arXiv Detail & Related papers (2024-09-26T05:39:52Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - A Transformer Model for Boundary Detection in Continuous Sign Language [55.05986614979846]
The Transformer model is employed for both Isolated Sign Language Recognition and Continuous Sign Language Recognition.
The training process involves using isolated sign videos, where hand keypoint features extracted from the input video are enriched.
The trained model, coupled with a post-processing method, is then applied to detect isolated sign boundaries within continuous sign videos.
arXiv Detail & Related papers (2024-02-22T17:25:01Z) - Context-aware attention layers coupled with optimal transport domain
adaptation and multimodal fusion methods for recognizing dementia from
spontaneous speech [0.0]
Alzheimer's disease (AD) constitutes a complex neurocognitive disease and is the main cause of dementia.
We propose some new methods for detecting AD patients, which capture the intra- and cross-modal interactions.
Experiments conducted on the ADReSS and ADReSSo Challenge indicate the efficacy of our introduced approaches over existing research initiatives.
arXiv Detail & Related papers (2023-05-25T18:18:09Z) - Mutual Learning of Single- and Multi-Channel End-to-End Neural
Diarization [34.65357110940456]
This paper focuses on speaker diarization and proposes to conduct the above bi-directional knowledge transfer alternately.
We introduce an end-to-end neural diarization model that can handle both single- and multi-channel inputs.
Experimental results on two-speaker data show that the proposed method mutually improved single- and multi-channel speaker diarization performances.
arXiv Detail & Related papers (2022-10-07T11:03:32Z) - Multi-Channel End-to-End Neural Diarization with Distributed Microphones [53.99406868339701]
We replace Transformer encoders in EEND with two types of encoders that process a multi-channel input.
We also propose a model adaptation method using only single-channel recordings.
arXiv Detail & Related papers (2021-10-10T03:24:03Z) - Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition [55.362258027878966]
We present momentum pseudo-labeling (MPL) as a simple yet effective strategy for semi-supervised speech recognition.
MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.
The experimental results demonstrate that MPL effectively improves over the base model and is scalable to different semi-supervised scenarios.
arXiv Detail & Related papers (2021-06-16T16:24:55Z) - Learning from Heterogeneous EEG Signals with Differentiable Channel
Reordering [51.633889765162685]
CHARM is a method for training a single neural network across inconsistent input channels.
We perform experiments on four EEG classification datasets and demonstrate the efficacy of CHARM.
arXiv Detail & Related papers (2020-10-21T12:32:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.