Unsupervised Pre-Training for Vietnamese Automatic Speech Recognition in
the HYKIST Project
- URL: http://arxiv.org/abs/2309.15869v1
- Date: Tue, 26 Sep 2023 21:12:09 GMT
- Title: Unsupervised Pre-Training for Vietnamese Automatic Speech Recognition in
the HYKIST Project
- Authors: Khai Le-Duc
- Abstract summary: Language difficulties between natives and immigrants present a common issue on a daily basis, especially in medical domain.
The goal of the HYKIST Project is to develop a speech translation system to support patient-doctor communication with ASR and MT.
We describe our efforts to construct ASR systems for a conversational telephone speech recognition task in the medical domain for Vietnamese language.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In today's interconnected globe, moving abroad is more and more prevalent,
whether it's for employment, refugee resettlement, or other causes. Language
difficulties between natives and immigrants present a common issue on a daily
basis, especially in medical domain. This can make it difficult for patients
and doctors to communicate during anamnesis or in the emergency room, which
compromises patient care. The goal of the HYKIST Project is to develop a speech
translation system to support patient-doctor communication with ASR and MT.
ASR systems have recently displayed astounding performance on particular
tasks for which enough quantities of training data are available, such as
LibriSpeech. Building a good model is still difficult due to a variety of
speaking styles, acoustic and recording settings, and a lack of in-domain
training data. In this thesis, we describe our efforts to construct ASR systems
for a conversational telephone speech recognition task in the medical domain
for Vietnamese language to assist emergency room contact between doctors and
patients across linguistic barriers. In order to enhance the system's
performance, we investigate various training schedules and data combining
strategies. We also examine how best to make use of the little data that is
available. The use of publicly accessible models like XLSR-53 is compared to
the use of customized pre-trained models, and both supervised and unsupervised
approaches are utilized using wav2vec 2.0 as architecture.
Related papers
- Enhancing AAC Software for Dysarthric Speakers in e-Health Settings: An Evaluation Using TORGO [0.13108652488669734]
Individuals with cerebral palsy (CP) and amyotrophic lateral sclerosis (ALS) frequently face challenges with articulation, leading to dysarthria and resulting in atypical speech patterns.
In healthcare settings, coomunication breakdowns reduce the quality of care.
We found that state-of-the-art (SOTA) automatic speech recognition (ASR) technology like Whisper and Wav2vec2.0 marginalizes atypical speakers largely due to the lack of training data.
arXiv Detail & Related papers (2024-11-01T19:11:54Z) - MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder [1.220481237642298]
We introduce MultiMed, a collection of small-to-large end-to-end ASR models for the medical domain.
We present the first reproducible study of multilinguality in medical ASR, conduct a layer-wise ablation study for end-to-end ASR training, and provide the first linguistic analysis for multilingual medical ASR.
arXiv Detail & Related papers (2024-09-21T09:05:48Z) - Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design [58.50329724298128]
This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications.
We release the open-source Mandarin Dysarthria Speech Corpus (MDSC), a dataset designed for dysarthric individuals in home environments.
We also develop a customized dysarthria WWS system that showcases robustness in handling intelligibility and achieving exceptional performance.
arXiv Detail & Related papers (2024-06-14T03:06:55Z) - Development of Hybrid ASR Systems for Low Resource Medical Domain
Conversational Telephone Speech [33.170046744835595]
In the HYKIST project, we consider patient-physician communication, more specifically between a German-speaking physician and an Arabic- or Vietnamese-speaking patient.
The HYKIST goal is to support the usually non-professional bilingual interpreter with an automatic speech translation system to improve patient care and help overcome language barriers.
In this work, we present our ASR system development efforts for this conversational telephone speech translation task in the medical domain for two languages pairs, data collection, various acoustic model architectures and dialect-induced difficulties.
arXiv Detail & Related papers (2022-10-24T16:49:19Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Clinical Dialogue Transcription Error Correction using Seq2Seq Models [1.663938381339885]
We present a seq2seq learning approach for ASR transcription error correction of clinical dialogues.
We fine-tune a seq2seq model on a mask-filling task using a domain-specific dataset which we have shared publicly for future research.
arXiv Detail & Related papers (2022-05-26T18:27:17Z) - Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech
Recognition [3.2631198264090746]
Aphasia is a common speech and language disorder, typically caused by a brain injury or a stroke, that affects millions of people worldwide.
We propose an end-to-end pipeline using pre-trained Automatic Speech Recognition (ASR) models that share cross-lingual speech representations.
arXiv Detail & Related papers (2022-04-01T14:05:02Z) - Self-Supervised Knowledge Assimilation for Expert-Layman Text Style
Transfer [63.72621204057025]
Expert-layman text style transfer technologies have the potential to improve communication between scientific communities and the general public.
High-quality information produced by experts is often filled with difficult jargon laypeople struggle to understand.
This is a particularly notable issue in the medical domain, where layman are often confused by medical text online.
arXiv Detail & Related papers (2021-10-06T17:57:22Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z) - Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate.
We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique.
Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z) - Transforming unstructured voice and text data into insight for paramedic
emergency service using recurrent and convolutional neural networks [68.8204255655161]
Paramedics often have to make lifesaving decisions within a limited time in an ambulance.
This study aims to automatically fuse voice and text data to provide tailored situational awareness information to paramedics.
arXiv Detail & Related papers (2020-05-30T06:47:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.