Real-Time Multimodal Cognitive Assistant for Emergency Medical Services
- URL: http://arxiv.org/abs/2403.06734v1
- Date: Mon, 11 Mar 2024 13:56:57 GMT
- Title: Real-Time Multimodal Cognitive Assistant for Emergency Medical Services
- Authors: Keshara Weerasinghe, Saahith Janapati, Xueren Ge, Sion Kim, Sneha
Iyer, John A. Stankovic, Homa Alemzadeh
- Abstract summary: This paper presents CognitiveEMS, an end-to-end wearable cognitive assistant system.
It can act as a collaborative virtual partner engaging in the real-time acquisition and analysis of multimodal data from an emergency scene.
- Score: 4.669165383466683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emergency Medical Services (EMS) responders often operate under
time-sensitive conditions, facing cognitive overload and inherent risks,
requiring essential skills in critical thinking and rapid decision-making. This
paper presents CognitiveEMS, an end-to-end wearable cognitive assistant system
that can act as a collaborative virtual partner engaging in the real-time
acquisition and analysis of multimodal data from an emergency scene and
interacting with EMS responders through Augmented Reality (AR) smart glasses.
CognitiveEMS processes the continuous streams of data in real-time and
leverages edge computing to provide assistance in EMS protocol selection and
intervention recognition. We address key technical challenges in real-time
cognitive assistance by introducing three novel components: (i) a Speech
Recognition model that is fine-tuned for real-world medical emergency
conversations using simulated EMS audio recordings, augmented with synthetic
data generated by large language models (LLMs); (ii) an EMS Protocol Prediction
model that combines state-of-the-art (SOTA) tiny language models with EMS
domain knowledge using graph-based attention mechanisms; (iii) an EMS Action
Recognition module which leverages multimodal audio and video data and protocol
predictions to infer the intervention/treatment actions taken by the responders
at the incident scene. Our results show that for speech recognition we achieve
superior performance compared to SOTA (WER of 0.290 vs. 0.618) on
conversational data. Our protocol prediction component also significantly
outperforms SOTA (top-3 accuracy of 0.800 vs. 0.200) and the action recognition
achieves an accuracy of 0.727, while maintaining an end-to-end latency of 3.78s
for protocol prediction on the edge and 0.31s on the server.
Related papers
- Digital FAST: An AI-Driven Multimodal Framework for Rapid and Early Stroke Screening [0.7136933021609076]
This study presents a fast, non-invasive multimodal deep learning framework for automatic binary stroke screening based on data collected during the F.A.S.T. assessment.<n>The proposed approach integrates complementary information from facial expressions, speech signals, and upper-body movements to enhance diagnostic robustness.
arXiv Detail & Related papers (2026-01-17T03:35:39Z) - A Smart-Glasses for Emergency Medical Services via Multimodal Multitask Learning [7.284746127785293]
We present EMSGlass, a smart-glasses system powered by EMSNet, and EMSServe, a low-latency multimodal serving framework tailored to EMS scenarios.<n>EMSNet integrates text, vital signs, and scene images to construct a unified real-time understanding of EMS incidents.<n>EMSServe achieves 1.9x -- 11.7x speedup over direct PyTorch multimodal inference.
arXiv Detail & Related papers (2025-11-17T07:27:52Z) - EgoEMS: A High-Fidelity Multimodal Egocentric Dataset for Cognitive Assistance in Emergency Medical Services [3.0776354206437664]
EgoEMS is the first end-to-end, high-fidelity, multimodal, multiperson dataset capturing over 20 hours of realistic, procedural EMS activities.<n>Developed in collaboration with EMS experts and aligned with national standards, EgoEMS is captured using an open-source, low-cost, and replicable data collection system.<n>We present a suite of benchmarks for real-time multimodal keystep recognition and action quality estimation, essential for developing AI support tools for EMS.
arXiv Detail & Related papers (2025-11-13T02:55:40Z) - DispatchMAS: Fusing taxonomy and artificial intelligence agents for emergency medical services [49.70819009392778]
Large Language Models (LLMs) and Multi-Agent Systems (MAS) offer opportunities to augment dispatchers.<n>This study aimed to develop and evaluate a taxonomy-grounded, multi-agent system for simulating realistic scenarios.
arXiv Detail & Related papers (2025-10-24T08:01:21Z) - impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z) - Holistic Artificial Intelligence in Medicine; improved performance and explainability [4.862319939462255]
xHAIM (Explainable HAIM) is a novel framework leveraging Generative AI to enhance both prediction and explainability.<n>xHAIM improves average AUC from 79.9% to 90.3% across chest pathology and operative tasks.<n>It transforms AI from a black-box predictor into an explainable decision support system, enabling clinicians to interactively trace predictions back to relevant patient data.
arXiv Detail & Related papers (2025-06-30T19:15:06Z) - Emotion Detection on User Front-Facing App Interfaces for Enhanced Schedule Optimization: A Machine Learning Approach [0.0]
We present and evaluate two complementary approaches to emotion detection.<n>A biometric-based method utilizing heart rate (HR) data extracted from electrocardiogram (ECG) signals to predict the emotional dimensions of Valence, Arousal, and Dominance; and a behavioral method analyzing computer activity through multiple machine learning models to classify emotions based on fine-grained user interactions such as mouse movements, clicks, and keystroke patterns.<n>Our comparative analysis, from real-world datasets, reveals that while both approaches demonstrate effectiveness, the computer activity-based method delivers superior consistency and accuracy, particularly for mouse-related interactions, which achieved approximately
arXiv Detail & Related papers (2025-06-24T03:21:46Z) - Towards user-centered interactive medical image segmentation in VR with an assistive AI agent [0.5578116134031106]
We propose SAMIRA, a novel conversational AI agent for medical VR that assists users with localizing, segmenting, and visualizing 3D medical concepts.<n>The system also supports true-to-scale 3D visualization of segmented pathology to enhance patient-specific anatomical understanding.<n>With a user study, evaluations demonstrated a high usability score (SUS=90.0 $pm$ 9.0), low overall task load, and strong support for the proposed VR system's guidance.
arXiv Detail & Related papers (2025-05-12T03:47:05Z) - MELON: Multimodal Mixture-of-Experts with Spectral-Temporal Fusion for Long-Term Mobility Estimation in Critical Care [1.5237145555729716]
We introduce MELON, a novel framework designed to predict 12-hour mobility status in the critical care setting.
We trained and evaluated the MELON model on the multimodal dataset of 126 patients recruited from nine Intensive Care Units at the University of Florida Health Shands Hospital main campus in Gainesville, Florida.
Results showed that MELON outperforms conventional approaches for 12-hour mobility status estimation.
arXiv Detail & Related papers (2025-03-10T19:47:46Z) - IoT-Based Real-Time Medical-Related Human Activity Recognition Using Skeletons and Multi-Stage Deep Learning for Healthcare [1.5236380958983642]
The Internet of Things (IoT) and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.
Human Motion Recognition (HMR) challenges such as high computational demands, low accuracy, and limited adaptability persist.
This study proposes a novel HMR method for MRHA detection, leveraging multi-stage deep learning techniques integrated with IoT.
arXiv Detail & Related papers (2025-01-13T03:41:57Z) - MANGO: Multimodal Acuity traNsformer for intelliGent ICU Outcomes [11.385654412265461]
We present MANGO: the Multimodal Acuity traNsformer for intelliGent ICU outcomes.
It is designed to enhance the prediction of patient acuity states, transitions, and the need for life-sustaining therapy.
arXiv Detail & Related papers (2024-12-13T23:51:15Z) - High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR [1.3810901729134184]
We introduce United-MedASR, a novel architecture that addresses challenges by integrating synthetic data generation, precision ASR fine-tuning, and semantic enhancement techniques.
United-MedASR constructs a specialised medical vocabulary by synthesising data from authoritative sources such as ICD-10, MIMS, and FDA databases.
To enhance processing speed, we incorporate Faster Whisper, ensuring streamlined and high-speed ASR performance.
arXiv Detail & Related papers (2024-11-24T17:02:48Z) - Early Recognition of Parkinson's Disease Through Acoustic Analysis and Machine Learning [0.0]
Parkinson's Disease (PD) is a progressive neurodegenerative disorder that significantly impacts both motor and non-motor functions, including speech.
This paper provides a comprehensive review of methods for PD recognition using speech data, highlighting advances in machine learning and data-driven approaches.
Various classification algorithms are explored, including logistic regression, SVM, and neural networks, with and without feature selection.
Our findings indicate that specific acoustic features and advanced machine-learning techniques can effectively differentiate between individuals with PD and healthy controls.
arXiv Detail & Related papers (2024-07-22T23:24:02Z) - Data-Driven Simulator for Mechanical Circulatory Support with Domain Adversarial Neural Process [15.562905335917408]
Existing mechanical simulators for MCS rely on oversimplifying assumptions and are insensitive to patient-specific behavior.
We use a neural process architecture to capture the probabilistic relationship between MCS pump levels and aortic pressure measurements with uncertainty.
Empirical results with an improvement of 19% in non-stationary trend prediction establish DANP as an effective tool for clinicians.
arXiv Detail & Related papers (2024-05-28T19:07:12Z) - Analyzing Participants' Engagement during Online Meetings Using Unsupervised Remote Photoplethysmography with Behavioral Features [50.82725748981231]
Engagement measurement finds application in healthcare, education, services.
Use of physiological and behavioral features is viable, but impracticality of traditional physiological measurement arises due to the need for contact sensors.
We demonstrate the feasibility of the unsupervised photoplethysmography (rmography) as an alternative for contact sensors.
arXiv Detail & Related papers (2024-04-05T20:39:16Z) - Clairvoyance: A Pipeline Toolkit for Medical Time Series [95.22483029602921]
Time-series learning is the bread and butter of data-driven *clinical decision support*
Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a software toolkit.
Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.
arXiv Detail & Related papers (2023-10-28T12:08:03Z) - Foresight -- Deep Generative Modelling of Patient Timelines using
Electronic Health Records [46.024501445093755]
Temporal modelling of medical history can be used to forecast and simulate future events, estimate risk, suggest alternative diagnoses or forecast complications.
We present Foresight, a novel GPT3-based pipeline that uses NER+L tools (i.e. MedCAT) to convert document text into structured, coded concepts.
arXiv Detail & Related papers (2022-12-13T19:06:00Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - MIA-Prognosis: A Deep Learning Framework to Predict Therapy Response [58.0291320452122]
This paper aims at a unified deep learning approach to predict patient prognosis and therapy response.
We formalize the prognosis modeling as a multi-modal asynchronous time series classification task.
Our predictive model could further stratify low-risk and high-risk patients in terms of long-term survival.
arXiv Detail & Related papers (2020-10-08T15:30:17Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.