Development of Hybrid ASR Systems for Low Resource Medical Domain
Conversational Telephone Speech
- URL: http://arxiv.org/abs/2210.13397v4
- Date: Fri, 22 Sep 2023 10:15:15 GMT
- Title: Development of Hybrid ASR Systems for Low Resource Medical Domain
Conversational Telephone Speech
- Authors: Christoph L\"uscher and Mohammad Zeineldeen and Zijian Yang and Tina
Raissi and Peter Vieting and Khai Le-Duc and Weiyue Wang and Ralf Schl\"uter
and Hermann Ney
- Abstract summary: In the HYKIST project, we consider patient-physician communication, more specifically between a German-speaking physician and an Arabic- or Vietnamese-speaking patient.
The HYKIST goal is to support the usually non-professional bilingual interpreter with an automatic speech translation system to improve patient care and help overcome language barriers.
In this work, we present our ASR system development efforts for this conversational telephone speech translation task in the medical domain for two languages pairs, data collection, various acoustic model architectures and dialect-induced difficulties.
- Score: 33.170046744835595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language barriers present a great challenge in our increasingly connected and
global world. Especially within the medical domain, e.g. hospital or emergency
room, communication difficulties and delays may lead to malpractice and
non-optimal patient care. In the HYKIST project, we consider patient-physician
communication, more specifically between a German-speaking physician and an
Arabic- or Vietnamese-speaking patient. Currently, a doctor can call the
Triaphon service to get assistance from an interpreter in order to help
facilitate communication. The HYKIST goal is to support the usually
non-professional bilingual interpreter with an automatic speech translation
system to improve patient care and help overcome language barriers. In this
work, we present our ASR system development efforts for this conversational
telephone speech translation task in the medical domain for two languages
pairs, data collection, various acoustic model architectures and
dialect-induced difficulties.
Related papers
- Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia [0.0]
This paper proposes a solution using a localized large language model (LLM) to transcribe, translate, and summarize doctor-patient conversations.
We utilize the Whisper model for transcription and GPT-3 to summarize them into the ePuskemas medical records format.
This innovation addresses challenges like overcrowded facilities and the administrative burden on healthcare providers in Indonesia.
arXiv Detail & Related papers (2024-09-25T16:13:42Z) - MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder [1.220481237642298]
We introduce MultiMed, a collection of small-to-large end-to-end ASR models for the medical domain.
We present the first reproducible study of multilinguality in medical ASR, conduct a layer-wise ablation study for end-to-end ASR training, and provide the first linguistic analysis for multilingual medical ASR.
arXiv Detail & Related papers (2024-09-21T09:05:48Z) - GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment [72.96949760114575]
We propose a novel cooperative communication framework, Goal-Oriented Mental Alignment (GOMA)
GOMA formulates verbal communication as a planning problem that minimizes the misalignment between parts of agents' mental states that are relevant to the goals.
We evaluate our approach against strong baselines in two challenging environments, Overcooked (a multiplayer game) and VirtualHome (a household simulator)
arXiv Detail & Related papers (2024-03-17T03:52:52Z) - Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset [26.504409173684653]
We introduce "ChatCoach", a human-AI cooperative framework designed to assist medical learners in practicing their communication skills during patient consultations.
ChatCoachdifferentiates itself from conventional dialogue systems by offering a simulated environment where medical learners can practice dialogues with a patient agent, while a coach agent provides immediate, structured feedback.
We have developed a dataset specifically for evaluating Large Language Models (LLMs) within the ChatCoach framework on communicative medical coaching tasks.
arXiv Detail & Related papers (2024-02-08T10:32:06Z) - Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - Unsupervised Pre-Training for Vietnamese Automatic Speech Recognition in
the HYKIST Project [0.0]
Language difficulties between natives and immigrants present a common issue on a daily basis, especially in medical domain.
The goal of the HYKIST Project is to develop a speech translation system to support patient-doctor communication with ASR and MT.
We describe our efforts to construct ASR systems for a conversational telephone speech recognition task in the medical domain for Vietnamese language.
arXiv Detail & Related papers (2023-09-26T21:12:09Z) - Talk2Care: Facilitating Asynchronous Patient-Provider Communication with
Large-Language-Model [29.982507402325396]
We built an LLM-powered communication system, Talk2Care, for older adults and healthcare providers.
For older adults, we leveraged the convenience and accessibility of voice assistants (VAs) and built an LLM-powered VA interface for effective information collection.
The results showed that Talk2Care could facilitate the communication process, enrich the health information collected from older adults, and considerably save providers' efforts and time.
arXiv Detail & Related papers (2023-09-17T19:46:03Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z) - Transforming unstructured voice and text data into insight for paramedic
emergency service using recurrent and convolutional neural networks [68.8204255655161]
Paramedics often have to make lifesaving decisions within a limited time in an ambulance.
This study aims to automatically fuse voice and text data to provide tailored situational awareness information to paramedics.
arXiv Detail & Related papers (2020-05-30T06:47:02Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.