Related papers: A Smart-Glasses for Emergency Medical Services via Multimodal Multitask Learning

A Smart-Glasses for Emergency Medical Services via Multimodal Multitask Learning

URL: http://arxiv.org/abs/2511.13078v1
Date: Mon, 17 Nov 2025 07:27:52 GMT
Title: A Smart-Glasses for Emergency Medical Services via Multimodal Multitask Learning
Authors: Liuyi Jin, Pasan Gunawardena, Amran Haroon, Runzhi Wang, Sangwoo Lee, Radu Stoleru, Michael Middleton, Zepeng Huo, Jeeeun Kim, Jason Moats,
Abstract summary: We present EMSGlass, a smart-glasses system powered by EMSNet, and EMSServe, a low-latency multimodal serving framework tailored to EMS scenarios.<n>EMSNet integrates text, vital signs, and scene images to construct a unified real-time understanding of EMS incidents.<n>EMSServe achieves 1.9x -- 11.7x speedup over direct PyTorch multimodal inference.
Score: 7.284746127785293
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Emergency Medical Technicians (EMTs) operate in high-pressure environments, making rapid, life-critical decisions under heavy cognitive and operational loads. We present EMSGlass, a smart-glasses system powered by EMSNet, the first multimodal multitask model for Emergency Medical Services (EMS), and EMSServe, a low-latency multimodal serving framework tailored to EMS scenarios. EMSNet integrates text, vital signs, and scene images to construct a unified real-time understanding of EMS incidents. Trained on real-world multimodal EMS datasets, EMSNet simultaneously supports up to five critical EMS tasks with superior accuracy compared to state-of-the-art unimodal baselines. Built on top of PyTorch, EMSServe introduces a modality-aware model splitter and a feature caching mechanism, achieving adaptive and efficient inference across heterogeneous hardware while addressing the challenge of asynchronous modality arrival in the field. By optimizing multimodal inference execution in EMS scenarios, EMSServe achieves 1.9x -- 11.7x speedup over direct PyTorch multimodal inference. A user study evaluation with six professional EMTs demonstrates that EMSGlass enhances real-time situational awareness, decision-making speed, and operational efficiency through intuitive on-glass interaction. In addition, qualitative insights from the user study provide actionable directions for extending EMSGlass toward next-generation AI-enabled EMS systems, bridging multimodal intelligence with real-world emergency response workflows.

Related papers

MedSAM-Agent: Empowering Interactive Medical Image Segmentation with Multi-turn Agentic Reinforcement Learning [53.37068897861388]
MedSAM-Agent is a framework that reformulates interactive segmentation as a multi-step autonomous decision-making process.<n>We develop a two-stage training pipeline that integrates multi-turn, end-to-end outcome verification.<n>Experiments across 6 medical modalities and 21 datasets demonstrate that MedSAM-Agent achieves state-of-the-art performance.
arXiv Detail & Related papers (2026-02-03T09:47:49Z)
Real-Time Mobile Video Analytics for Pre-arrival Emergency Medical Services [5.251367919801455]
We present TeleEMS, a mobile live video analytics system that enables pre-arrival multimodal inference.<n>The TeleEMS Client runs across phones, smart glasses, and desktops to support bystanders, EMTs en route, and 911 dispatchers.<n>On top of EMSStream, the server hosts three real-time analytics modules: audio-to-symptom analytics via EMSLlama, video-to-vital analytics using state-of-the-art r methods for heart rate estimation.
arXiv Detail & Related papers (2025-11-18T04:12:21Z)
EgoEMS: A High-Fidelity Multimodal Egocentric Dataset for Cognitive Assistance in Emergency Medical Services [3.0776354206437664]
EgoEMS is the first end-to-end, high-fidelity, multimodal, multiperson dataset capturing over 20 hours of realistic, procedural EMS activities.<n>Developed in collaboration with EMS experts and aligned with national standards, EgoEMS is captured using an open-source, low-cost, and replicable data collection system.<n>We present a suite of benchmarks for real-time multimodal keystep recognition and action quality estimation, essential for developing AI support tools for EMS.
arXiv Detail & Related papers (2025-11-13T02:55:40Z)
Agentic Systems in Radiology: Design, Applications, Evaluation, and Challenges [13.53016942028838]
Large language models (LLMs) are capable of using natural language to integrate information, follow instructions, and perform forms of "reasoning" and planning.<n>With its multimodal data streams and orchestrated spanning multiple systems, radiology is uniquely suited to benefit from agents that can adapt to context and automate repetitive yet complex tasks.<n>This review examines the design of such LLM agentic systems, highlights key applications, discusses evaluation methods for planning and tool use, and outlines challenges such as error cascades, tool-use efficiency, and health IT integration.
arXiv Detail & Related papers (2025-10-10T13:56:27Z)
MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning [82.14973479594367]
Large Language Models (LLMs) for complex reasoning tasks require innovative approaches that bridge intuitive and deliberate cognitive processes.<n>This paper introduces a Multi-Agent System for Deep ReSearch (MARS) enabling seamless integration of System 1's fast, intuitive thinking with System 2's deliberate reasoning.
arXiv Detail & Related papers (2025-10-06T15:42:55Z)
Medchain: Bridging the Gap Between LLM Agents and Clinical Practice with Interactive Sequence [68.05876437208505]
We present MedChain, a dataset of 12,163 clinical cases that covers five key stages of clinical workflow.<n>We also propose MedChain-Agent, an AI system that integrates a feedback mechanism and a MCase-RAG module to learn from previous cases and adapt its responses.
arXiv Detail & Related papers (2024-12-02T15:25:02Z)
STLLM-DF: A Spatial-Temporal Large Language Model with Diffusion for Enhanced Multi-Mode Traffic System Forecasting [32.943673568195315]
We propose the Spatial-Temporal Large Language Model (STLLM-DF) to improve multi-task transportation prediction. The DDPM's robust denoising capabilities enable it to recover underlying data patterns from noisy inputs. We show that STLLM-DF consistently outperforms existing models, achieving an average reduction of 2.40% in MAE, 4.50% in RMSE, and 1.51% in MAPE.
arXiv Detail & Related papers (2024-09-08T15:29:27Z)
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications. Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders. We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z)
Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems. This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z)
Real-Time Multimodal Cognitive Assistant for Emergency Medical Services [4.669165383466683]
This paper presents CognitiveEMS, an end-to-end wearable cognitive assistant system. It can act as a collaborative virtual partner engaging in the real-time acquisition and analysis of multimodal data from an emergency scene.
arXiv Detail & Related papers (2024-03-11T13:56:57Z)
Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition [81.2011058113579]
We argue that both the feature multimodality and conversational contextualization should be properly modeled simultaneously during the feature disentanglement and fusion steps. We propose a Contribution-aware Fusion Mechanism (CFM) and a Context Refusion Mechanism ( CRM) for multimodal and context integration. Our system achieves new state-of-the-art performance consistently.
arXiv Detail & Related papers (2023-08-08T18:11:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.