MSA-CNN: A Lightweight Multi-Scale CNN with Attention for Sleep Stage Classification
- URL: http://arxiv.org/abs/2501.02949v1
- Date: Mon, 06 Jan 2025 11:46:02 GMT
- Title: MSA-CNN: A Lightweight Multi-Scale CNN with Attention for Sleep Stage Classification
- Authors: Stephan Goerttler, Yucheng Wang, Emadeldeen Eldele, Min Wu, Fei He,
- Abstract summary: We introduce Multi-Scale and Attention Convolutional Neural Network (MSA-CNN), a lightweight architecture featuring as few as 10,000 parameters.
Model complexity is further reduced by separating temporal and spatial feature extraction and using cost-effective global spatial convolutions.
Our evaluation, based on repeated cross-validation and re-evaluation of all baseline models, demonstrated that the large MSA-CNN outperformed all baseline models on all three datasets in terms of accuracy and Cohen's kappa.
- Score: 14.221889446444433
- License:
- Abstract: Recent advancements in machine learning-based signal analysis, coupled with open data initiatives, have fuelled efforts in automatic sleep stage classification. Despite the proliferation of classification models, few have prioritised reducing model complexity, which is a crucial factor for practical applications. In this work, we introduce Multi-Scale and Attention Convolutional Neural Network (MSA-CNN), a lightweight architecture featuring as few as ~10,000 parameters. MSA-CNN leverages a novel multi-scale module employing complementary pooling to eliminate redundant filter parameters and dense convolutions. Model complexity is further reduced by separating temporal and spatial feature extraction and using cost-effective global spatial convolutions. This separation of tasks not only reduces model complexity but also mirrors the approach used by human experts in sleep stage scoring. We evaluated both small and large configurations of MSA-CNN against nine state-of-the-art baseline models across three public datasets, treating univariate and multivariate models separately. Our evaluation, based on repeated cross-validation and re-evaluation of all baseline models, demonstrated that the large MSA-CNN outperformed all baseline models on all three datasets in terms of accuracy and Cohen's kappa, despite its significantly reduced parameter count. Lastly, we explored various model variants and conducted an in-depth analysis of the key modules and techniques, providing deeper insights into the underlying mechanisms. The code for our models, baselines, and evaluation procedures is available at https://github.com/sgoerttler/MSA-CNN.
Related papers
- DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers [34.282971510732736]
We introduce DiTMoS, a novel DNN training and inference framework with a selector-classifiers architecture.
A composition of weak models can exhibit high diversity and the union of them can significantly boost the accuracy upper bound.
We deploy DiTMoS on the Neucleo STM32F767ZI board and evaluate it based on three time-series datasets for human activity recognition, keywords spotting, and emotion recognition.
arXiv Detail & Related papers (2024-03-14T02:11:38Z) - A model for multi-attack classification to improve intrusion detection
performance using deep learning approaches [0.0]
The objective here is to create a reliable intrusion detection mechanism to help identify malicious attacks.
Deep learning based solution framework is developed consisting of three approaches.
The first approach is Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) with seven functions such as adamax, SGD, adagrad, adam, RMSprop, nadam and adadelta.
The models self-learnt the features and classifies the attack classes as multi-attack classification.
arXiv Detail & Related papers (2023-10-25T05:38:44Z) - Keep It Simple: CNN Model Complexity Studies for Interference
Classification Tasks [7.358050500046429]
We study the trade-off amongst dataset size, CNN model complexity, and classification accuracy under various levels of classification difficulty.
Our study, based on three wireless datasets, shows that a simpler CNN model with fewer parameters can perform just as well as a more complex model.
arXiv Detail & Related papers (2023-03-06T17:53:42Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution
Detection [55.028065567756066]
Out-of-distribution (OOD) detection has recently received much attention from the machine learning community due to its importance in deploying machine learning models in real-world applications.
In this paper we propose an uncertainty quantification approach by modelling the distribution of features.
We incorporate an efficient ensemble mechanism, namely batch-ensemble, to construct the batch-ensemble neural networks (BE-SNNs) and overcome the feature collapse problem.
We show that BE-SNNs yield superior performance on several OOD benchmarks, such as the Two-Moons dataset, the FashionMNIST vs MNIST dataset, FashionM
arXiv Detail & Related papers (2022-06-26T16:00:22Z) - Animal Behavior Classification via Accelerometry Data and Recurrent
Neural Networks [11.099308746733028]
We study the classification of animal behavior using accelerometry data through various recurrent neural network (RNN) models.
We evaluate the classification performance and complexity of the considered models.
We also include two state-of-the-art convolutional neural network (CNN)-based time-series classification models in the evaluations.
arXiv Detail & Related papers (2021-11-24T23:28:25Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Model Extraction Attacks against Recurrent Neural Networks [1.2891210250935146]
We study the threats of model extraction attacks against recurrent neural networks (RNNs)
We discuss whether a model with a higher accuracy can be extracted with a simple RNN from a long short-term memory (LSTM)
We then show that a model with a higher accuracy can be extracted efficiently, especially through configuring a loss function and a more complex architecture.
arXiv Detail & Related papers (2020-02-01T01:47:50Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.