PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech
Assistant using Interventional Radiology Workflow Analysis
- URL: http://arxiv.org/abs/2206.12320v1
- Date: Fri, 24 Jun 2022 14:39:11 GMT
- Title: PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech
Assistant using Interventional Radiology Workflow Analysis
- Authors: Kubilay Can Demir, Matthias May, Axel Schmid, Michael Uder, Katharina
Breininger, Tobias Weise, Andreas Maier, Seung Hee Yang
- Abstract summary: This paper presents a new multimodal interventional radiology dataset, called PoCaP (Port Catheter Placement) Corpus.
This corpus consists of speech and audio signals in German, X-ray images, and system commands collected from 31 PoCaP interventions by six surgeons.
- Score: 7.189635716814341
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This paper presents a new multimodal interventional radiology dataset, called
PoCaP (Port Catheter Placement) Corpus. This corpus consists of speech and
audio signals in German, X-ray images, and system commands collected from 31
PoCaP interventions by six surgeons with average duration of 81.4 $\pm$ 41.0
minutes. The corpus aims to provide a resource for developing a smart speech
assistant in operating rooms. In particular, it may be used to develop a speech
controlled system that enables surgeons to control the operation parameters
such as C-arm movements and table positions. In order to record the dataset, we
acquired consent by the institutional review board and workers council in the
University Hospital Erlangen and by the patients for data privacy. We describe
the recording set-up, data structure, workflow and preprocessing steps, and
report the first PoCaP Corpus speech recognition analysis results with 11.52
$\%$ word error rate using pretrained models. The findings suggest that the
data has the potential to build a robust command recognition system and will
allow the development of a novel intervention support systems using speech and
image processing in the medical domain.
Related papers
- A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning [11.817595076396925]
Diagnostic Captioning (DC) automatically generates a diagnostic text from one or more medical images of a patient.
We propose a new data-driven guided decoding method that incorporates medical information into the beam search of the diagnostic text generation process.
We evaluate the proposed method on two medical datasets using four DC systems that range from generic image-to-text systems with CNN encoders to pre-trained Large Language Models.
arXiv Detail & Related papers (2024-06-20T10:08:17Z) - Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design [58.50329724298128]
This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications.
We release the open-source Mandarin Dysarthria Speech Corpus (MDSC), a dataset designed for dysarthric individuals in home environments.
We also develop a customized dysarthria WWS system that showcases robustness in handling intelligibility and achieving exceptional performance.
arXiv Detail & Related papers (2024-06-14T03:06:55Z) - RaDialog: A Large Vision-Language Model for Radiology Report Generation
and Conversational Assistance [53.20640629352422]
Conversational AI tools can generate and discuss clinically correct radiology reports for a given medical image.
RaDialog is the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog.
Our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions.
arXiv Detail & Related papers (2023-11-30T16:28:40Z) - Summarizing Patients Problems from Hospital Progress Notes Using
Pre-trained Sequence-to-Sequence Models [9.879960506853145]
Problem list summarization requires a model to understand, abstract, and generate clinical documentation.
We propose a new NLP task that aims to generate a list of problems in a patient's daily care plan using input from the provider's progress notes during hospitalization.
arXiv Detail & Related papers (2022-08-17T17:07:35Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - EasyCall corpus: a dysarthric speech dataset [4.6760299097922715]
This paper introduces a new dysarthric speech command dataset in Italian, called EasyCall corpus.
The dataset consists of 21386 audio recordings from 24 healthy and 31 dysarthric speakers, whose individual degree of speech impairment was assessed by neurologists.
The corpus aims at providing a resource for the development of ASR-based assistive technologies for patients with dysarthria.
arXiv Detail & Related papers (2021-04-06T14:32:47Z) - Comparison of Speaker Role Recognition and Speaker Enrollment Protocol
for conversational Clinical Interviews [9.728371067160941]
We train end-to-end neural network architectures to adapt to each task and evaluate each approach under the same metric.
Results do not depend on the demographics of the Interviewee, highlighting the clinical relevance of our methods.
arXiv Detail & Related papers (2020-10-30T09:07:37Z) - Transforming unstructured voice and text data into insight for paramedic
emergency service using recurrent and convolutional neural networks [68.8204255655161]
Paramedics often have to make lifesaving decisions within a limited time in an ambulance.
This study aims to automatically fuse voice and text data to provide tailored situational awareness information to paramedics.
arXiv Detail & Related papers (2020-05-30T06:47:02Z) - Continuous speech separation: dataset and analysis [52.10378896407332]
In natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components.
This paper describes a dataset and protocols for evaluating continuous speech separation algorithms.
arXiv Detail & Related papers (2020-01-30T18:01:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.