Voice EHR: Introducing Multimodal Audio Data for Health
- URL: http://arxiv.org/abs/2404.01620v2
- Date: Sat, 1 Jun 2024 19:44:34 GMT
- Title: Voice EHR: Introducing Multimodal Audio Data for Health
- Authors: James Anibal, Hannah Huth, Ming Li, Lindsey Hazen, Yen Minh Lam, Hang Nguyen, Phuc Hong, Michael Kleinman, Shelley Ost, Christopher Jackson, Laura Sprabery, Cheran Elangovan, Balaji Krishnaiah, Lee Akst, Ioan Lina, Iqbal Elyazar, Lenny Ekwati, Stefan Jansen, Richard Nduwayezu, Charisse Garcia, Jeffrey Plum, Jacqueline Brenner, Miranda Song, Emily Ricotta, David Clifton, C. Louise Thwaites, Yael Bensoussan, Bradford Wood,
- Abstract summary: This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application.
This application ultimately results in an audio electronic health record (voice EHR) which may contain complex biomarkers of health from conventional voice/respiratory features, speech patterns, and language with semantic meaning.
- Score: 3.876405146656873
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large AI models trained on audio data may have the potential to rapidly classify patients, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets using expensive recording equipment in high-income, English-speaking countries. This challenges deployment in resource-constrained, high-volume settings where audio data may have a profound impact. This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application. This application ultimately results in an audio electronic health record (voice EHR) which may contain complex biomarkers of health from conventional voice/respiratory features, speech patterns, and language with semantic meaning - compensating for the typical limitations of unimodal clinical datasets. This report introduces a consortium of partners for global work, presents the application used for data collection, and showcases the potential of informative voice EHR to advance the scalability and diversity of audio AI.
Related papers
- RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction [20.974460332254544]
RespLLM is a novel framework that unifies text and audio representations for respiratory health prediction.
Our work lays the foundation for multimodal models that can perceive, listen, and understand heterogeneous data.
arXiv Detail & Related papers (2024-10-07T17:06:11Z) - SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.
It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.
It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification [0.0]
We fine-tune a pretrained text-audio multimodal model using free-text descriptions derived from the sound samples' metadata.
Our method achieves state-of-the-art performance on the ICBHI dataset, surpassing the previous best result by a notable margin of 1.17%.
arXiv Detail & Related papers (2024-06-10T20:49:54Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP [9.432205523734707]
We introduce a new task of automatically generating lay definitions, aiming to simplify medical terms into patient-friendly lay language.
We first created the dataset, an extensive collection of over 50,000 unique (medical term, lay definition) pairs and 300,000 mentions.
We have also engineered a data-centric Human-AI pipeline that synergizes data filtering, augmentation, and selection to improve data quality.
arXiv Detail & Related papers (2023-12-24T23:01:00Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Analysing the Impact of Audio Quality on the Use of Naturalistic
Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills.
Recent developments have enabled the use of more naturalistic training data for computational models.
It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z) - Deep Feature Learning for Medical Acoustics [78.56998585396421]
The purpose of this paper is to compare different learnables in medical acoustics tasks.
A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by pathologies.
arXiv Detail & Related papers (2022-08-05T10:39:37Z) - A Methodology for a Scalable, Collaborative, and Resource-Efficient
Platform to Facilitate Healthcare AI Research [0.0]
We present a system to accelerate data acquisition, dataset development and analysis, and AI model development.
This system can ingest 15,000 patient records per hour, where each record represents thousands of measurements, text notes, and high resolution data.
arXiv Detail & Related papers (2021-12-13T18:39:10Z) - Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities.
This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.
We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.