Related papers: Voice EHR: Introducing Multimodal Audio Data for Health

Voice EHR: Introducing Multimodal Audio Data for Health

URL: http://arxiv.org/abs/2404.01620v3
Date: Sat, 09 Nov 2024 17:22:08 GMT
Title: Voice EHR: Introducing Multimodal Audio Data for Health
Authors: James Anibal, Hannah Huth, Ming Li, Lindsey Hazen, Veronica Daoud, Dominique Ebedes, Yen Minh Lam, Hang Nguyen, Phuc Hong, Michael Kleinman, Shelley Ost, Christopher Jackson, Laura Sprabery, Cheran Elangovan, Balaji Krishnaiah, Lee Akst, Ioan Lina, Iqbal Elyazar, Lenny Ekwati, Stefan Jansen, Richard Nduwayezu, Charisse Garcia, Jeffrey Plum, Jacqueline Brenner, Miranda Song, Emily Ricotta, David Clifton, C. Louise Thwaites, Yael Bensoussan, Bradford Wood,
Abstract summary: Existing technologies depend on limited datasets collected with expensive recording equipment in high-income countries. This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application.
Score: 3.8090294667599927
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Artificial intelligence (AI) models trained on audio data may have the potential to rapidly perform clinical tasks, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets collected with expensive recording equipment in high-income countries, which challenges deployment in resource-constrained, high-volume settings where audio data may have a profound impact on health equity. This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application. The app facilitates the collection of an audio electronic health record (Voice EHR) which may contain complex biomarkers of health from conventional voice/respiratory features, speech patterns, and spoken language with semantic meaning and longitudinal context, potentially compensating for the typical limitations of unimodal clinical datasets. This report presents the application used for data collection, initial experiments on data quality, and case studies which demonstrate the potential of voice EHR to advance the scalability/diversity of audio AI.

Related papers

RA-QA: Towards Respiratory Audio-based Health Question Answering [17.905364553833724]
Respiratory diseases are a leading cause of death globally, highlighting the urgent need for early and accessible screening methods.<n>There remains a critical gap: the lack of intelligent systems that can interact in real-time consultations using natural language.<n>We curated and harmonized data from 11 diverse respiratory audio datasets to construct the first Respiratory Audio Question Answering dataset.
arXiv Detail & Related papers (2026-02-04T13:25:47Z)
A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications [77.3888788549565]
We present EchoCare, a novel ultrasound foundation model for generalist clinical use.<n>We developed EchoCare via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData.<n>With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks.
arXiv Detail & Related papers (2025-09-15T10:05:31Z)
Extracting OPQRST in Electronic Health Records using Large Language Models with Reasoning [3.486461799078777]
This paper introduces a novel approach to extracting the OPQRST assessment from EHRs by leveraging the capabilities of Large Language Models (LLMs)<n>We propose to reframe the task from sequence labeling to text generation, enabling the models to provide reasoning steps that mimic a physician's cognitive processes.<n>Our contributions demonstrate a significant advancement in the use of AI in healthcare, offering a scalable solution that improves the accuracy and usability of information extraction from EHRs.
arXiv Detail & Related papers (2025-09-02T02:21:02Z)
A Comprehensive Review of Datasets for Clinical Mental Health AI Systems [55.67299586253951]
We present the first comprehensive survey of clinical mental health datasets relevant to the training and development of AI-powered clinical assistants.<n>Our survey identifies critical gaps such as a lack of longitudinal data, limited cultural and linguistic representation, inconsistent collection and annotation standards, and a lack of modalities in synthetic data.
arXiv Detail & Related papers (2025-08-13T13:42:35Z)
CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning [17.462121203082006]
CaReAQA is an audio-language model that integrates a foundation audio model with the reasoning capabilities of large language models.<n>We introduce CaReSound, a benchmark dataset of annotated medical audio recordings enriched with metadata.<n> evaluation results show that CaReAQA achieves 86.2% accuracy on open-ended diagnostic reasoning tasks.
arXiv Detail & Related papers (2025-05-02T11:42:46Z)
RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction [20.974460332254544]
RespLLM is a novel framework that unifies text and audio representations for respiratory health prediction. Our work lays the foundation for multimodal models that can perceive, listen, and understand heterogeneous data.
arXiv Detail & Related papers (2024-10-07T17:06:11Z)
SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark. It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content. It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z)
Speaking the Same Language: Leveraging LLMs in Standardizing Clinical Data for AI [0.0]
This study delves into the adoption of large language models to address specific challenges, specifically, the standardization of healthcare data. Our results illustrate that employing large language models significantly diminishes the necessity for manual data curation. The proposed methodology has the propensity to expedite the integration of AI in healthcare, ameliorate the quality of patient care, whilst minimizing the time and financial resources necessary for the preparation of data for AI.
arXiv Detail & Related papers (2024-08-16T20:51:21Z)
Speech Emotion Recognition under Resource Constraints with Data Distillation [64.36799373890916]
Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things presents challenges in constructing intricate deep learning models. We propose a data distillation framework to facilitate efficient development of SER models in IoT applications.
arXiv Detail & Related papers (2024-06-21T13:10:46Z)
BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification [0.0]
We fine-tune a pretrained text-audio multimodal model using free-text descriptions derived from the sound samples' metadata. Our method achieves state-of-the-art performance on the ICBHI dataset, surpassing the previous best result by a notable margin of 1.17%.
arXiv Detail & Related papers (2024-06-10T20:49:54Z)
README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP [9.432205523734707]
We introduce a new task of automatically generating lay definitions, aiming to simplify medical terms into patient-friendly lay language. We first created the dataset, an extensive collection of over 50,000 unique (medical term, lay definition) pairs and 300,000 mentions. We have also engineered a data-centric Human-AI pipeline that synergizes data filtering, augmentation, and selection to improve data quality.
arXiv Detail & Related papers (2023-12-24T23:01:00Z)
Radiology Report Generation Using Transformers Conditioned with Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information. The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z)
Analysing the Impact of Audio Quality on the Use of Naturalistic Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills. Recent developments have enabled the use of more naturalistic training data for computational models. It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z)
Deep Feature Learning for Medical Acoustics [78.56998585396421]
The purpose of this paper is to compare different learnables in medical acoustics tasks. A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by pathologies.
arXiv Detail & Related papers (2022-08-05T10:39:37Z)
Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities. This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time. We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.