Digital FAST: An AI-Driven Multimodal Framework for Rapid and Early Stroke Screening
- URL: http://arxiv.org/abs/2601.11896v1
- Date: Sat, 17 Jan 2026 03:35:39 GMT
- Title: Digital FAST: An AI-Driven Multimodal Framework for Rapid and Early Stroke Screening
- Authors: Ngoc-Khai Hoang, Thi-Nhu-Mai Nguyen, Huy-Hieu Pham,
- Abstract summary: This study presents a fast, non-invasive multimodal deep learning framework for automatic binary stroke screening based on data collected during the F.A.S.T. assessment.<n>The proposed approach integrates complementary information from facial expressions, speech signals, and upper-body movements to enhance diagnostic robustness.
- Score: 0.7136933021609076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Early identification of stroke symptoms is essential for enabling timely intervention and improving patient outcomes, particularly in prehospital settings. This study presents a fast, non-invasive multimodal deep learning framework for automatic binary stroke screening based on data collected during the F.A.S.T. assessment. The proposed approach integrates complementary information from facial expressions, speech signals, and upper-body movements to enhance diagnostic robustness. Facial dynamics are represented using landmark based features and modeled with a Transformer architecture to capture temporal dependencies. Speech signals are converted into mel spectrograms and processed using an Audio Spectrogram Transformer, while upper-body pose sequences are analyzed with an MLP-Mixer network to model spatiotemporal motion patterns. The extracted modality specific representations are combined through an attention-based fusion mechanism to effectively learn cross modal interactions. Experiments conducted on a self-collected dataset of 222 videos from 37 subjects demonstrate that the proposed multimodal model consistently outperforms unimodal baselines, achieving 95.83% accuracy and a 96.00% F1-score. The model attains a strong balance between sensitivity and specificity and successfully detects all stroke cases in the test set. These results highlight the potential of multimodal learning and transfer learning for early stroke screening, while emphasizing the need for larger, clinically representative datasets to support reliable real-world deployment.
Related papers
- Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis [14.922065513695294]
Resp-Agent is an autonomous multimodal system orchestrated by a novel Active Adrial Curriculum Agent (Thinker-A$2$CA)<n>To address the representation gap, we introduce a Modality-Weaving Diagnoser that weaves EHR data with audio tokens via Strategic Global Attention.<n>To address the data gap, we design a Flow Matching Generator that adapts a text-only Large Language Model (LLM) via modality injection.
arXiv Detail & Related papers (2026-02-16T14:48:24Z) - Automated Lesion Segmentation of Stroke MRI Using nnU-Net: A Comprehensive External Validation Across Acute and Chronic Lesions [0.0]
We evaluate stroke lesion segmentation using the nnU-Net framework across multiple publicly available MRI datasets.<n>Across stroke stages, models showed robust generalisation, with segmentation accuracy approaching reported inter-rater reliability.<n>In acute stroke, DWI-trained models consistently outperformed FLAIR-based models, with only modest gains from multimodal combinations.<n>In chronic stroke, increasing training set size improved performance, with diminishing returns beyond several hundred cases.
arXiv Detail & Related papers (2026-01-13T16:29:20Z) - FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis [92.23551599659186]
Time series analysis plays a vital role in fields such as finance, healthcare, industry, and meteorology.<n>FusAD is a unified analysis framework designed for diverse time series tasks.
arXiv Detail & Related papers (2025-12-16T04:34:27Z) - Machine Learning Approaches to Clinical Risk Prediction: Multi-Scale Temporal Alignment in Electronic Health Records [2.9576397177561087]
This study proposes a risk prediction method based on a Multi-Scale Temporal Alignment Network (MSTAN)<n>It addresses the challenges of temporal irregularity, sampling interval differences, and multi-scale dynamic dependencies in Electronic Health Records (EHR)<n> Experiments conducted on publicly available EHR datasets show that the proposed model outperforms mainstream baselines in accuracy, recall, precision, and F1-Score.
arXiv Detail & Related papers (2025-11-26T16:33:59Z) - From Prototypes to Sparse ECG Explanations: SHAP-Driven Counterfactuals for Multivariate Time-Series Multi-class Classification [8.113866195465976]
We propose a prototype-driven framework for generating sparse counterfactual explanations tailored to 12-lead ECG classification models.<n>Our method employs SHAP-based thresholds to identify critical signal segments and convert them into interval rules.<n>We evaluate three variants of our approach, Original, Sparse, and Aligned Sparse, with class-specific performance ranging from 98.9% validity for MI to challenges with hypertrophy (HYP) detection.
arXiv Detail & Related papers (2025-10-22T12:09:50Z) - impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z) - A Novel Data Augmentation Strategy for Robust Deep Learning Classification of Biomedical Time-Series Data: Application to ECG and EEG Analysis [2.355460994057843]
This study proposes a novel and unified deep learning framework that achieves state-of-the-art performance across different signal types.<n>Unlike prior work, we scientifically increase signal complexity to achieve future-reaching capabilities, which resulted in the best predictions.<n>The architecture requires 130 MB of memory and processes each sample in 10 ms, suggesting suitability for deployment on low-end or wearable devices.
arXiv Detail & Related papers (2025-07-16T21:38:10Z) - Clinical NLP with Attention-Based Deep Learning for Multi-Disease Prediction [44.0876796031468]
This paper addresses the challenges posed by the unstructured nature and high-dimensional semantic complexity of electronic health record texts.<n>A deep learning method based on attention mechanisms is proposed to achieve unified modeling for information extraction and multi-label disease prediction.
arXiv Detail & Related papers (2025-07-02T07:45:22Z) - FauForensics: Boosting Audio-Visual Deepfake Detection with Facial Action Units [40.86547778808649]
We propose a novel framework called FauForensics for detecting audio-visual deepfakes.<n>Our method computes fine-grained frame-wise audiovisual similarities via a dedicated fusion module.<n>Experiments on FakeAVCeleb and LAV-DF show state-of-the-art (SOTA) performance and superior cross-dataset generalizability with up to an average of 4.83%.
arXiv Detail & Related papers (2025-05-13T07:18:07Z) - CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis [50.56875995511431]
We introduce a Cross-Modal Temporal Pattern Discovery (CTPD) framework, designed to efficiently extract meaningful cross-modal temporal patterns from multimodal EHR data.<n>Our approach introduces shared initial temporal pattern representations which are refined using slot attention to generate temporal semantic embeddings.
arXiv Detail & Related papers (2024-11-01T15:54:07Z) - Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - Robustly Pre-trained Neural Model for Direct Temporal Relation
Extraction [10.832917897850361]
We studied several variants of BERT (Bidirectional Representations using Transformers)
We evaluated these methods using a direct temporal relations dataset which is a semantically focused subset of the 2012 i2b2 temporal relations challenge dataset.
Results: RoBERTa, which employs better pre-training strategies including using 10x larger corpus, has improved overall F measure by 0.0864 absolute score (on the 1.00 scale) and thus reducing the error rate by 24% relative to the previous state-of-the-art performance achieved with an SVM (support vector machine) model.
arXiv Detail & Related papers (2020-04-13T22:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.