Related papers: Evaluating Automatic Speech Recognition Systems for Korean Meteorological Experts

Related papers

PRiSM: Benchmarking Phone Realization in Speech Models [70.82595415252682]
Phone recognition (PR) serves as the atomic interface for language-agnostic modeling for cross-lingual speech processing and phonetic analysis.<n>We introduce PRiSM, the first open-source benchmark designed to expose blind spots in phonetic perception.
arXiv Detail & Related papers (2026-01-20T15:00:36Z)
Predict the Retrieval! Test time adaptation for Retrieval Augmented Generation [66.36556189794526]
TTARAG is a test-time adaptation method that dynamically updates the language model's parameters during inference to improve RAG system performance in specialized domains.<n>Our method introduces a simple yet effective approach where the model learns to predict retrieved content, enabling automatic parameter adjustment to the target domain.
arXiv Detail & Related papers (2026-01-16T17:07:01Z)
Efficient Multilingual ASR Finetuning via LoRA Language Experts [59.27778147311189]
This paper proposes an efficient finetuning framework for customized multilingual ASR via prepared LoRA language experts based on Whisper.<n>Through LoRA expert fusion or knowledge distillation, our approach achieves better recognition performance on target languages than standard fine-tuning methods.<n> Experimental results demonstrate that the proposed models yield approximately 10% and 15% relative performance gains in language-aware and language-agnostic scenarios.
arXiv Detail & Related papers (2025-06-11T07:06:27Z)
PSRB: A Comprehensive Benchmark for Evaluating Persian ASR Systems [0.0]
This paper introduces Persian Speech Recognition Benchmark(PSRB), a comprehensive benchmark designed to address this gap by incorporating diverse linguistic and acoustic conditions.<n>We evaluate ten ASR systems, including state-of-the-art commercial and open-source models, to examine performance variations and inherent biases.<n>Our findings indicate that while ASR models generally perform well on standard Persian, they struggle with regional accents, children's speech, and specific linguistic challenges.
arXiv Detail & Related papers (2025-05-27T14:14:55Z)
GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems [8.669397145785942]
We propose Generative Error Correction via Retrieval-Augmented Generation (GEC-RAG) to improve ASR accuracy for low-resource domains, like Persian. GEC-RAG retrieves lexically similar examples to the ASR transcription using the Term Frequency-Inverse Document Frequency (TF-IDF) measure.
arXiv Detail & Related papers (2025-01-18T11:53:22Z)
Fotheidil: an Automatic Transcription System for the Irish Language [6.87666483638516]
Fotheidil is the first web-based transcription system for the Irish language. It uses speech-related AI technologies as part of the ABAIR initiative.
arXiv Detail & Related papers (2024-12-31T15:44:30Z)
Enhancing Multilingual ASR for Unseen Languages via Language Embedding Modeling [50.62091603179394]
Whisper, one of the most advanced ASR models, handles 99 languages effectively. However, Whisper struggles with unseen languages, those not included in its pre-training. We propose methods that exploit these relationships to enhance ASR performance on unseen languages.
arXiv Detail & Related papers (2024-12-21T04:05:43Z)
Advocating Character Error Rate for Multilingual ASR Evaluation [1.2597747768235845]
We document the limitations of the word error rate (WER) as an evaluation metric and advocate for the character error rate (CER) as the primary metric. We show that CER avoids many of the challenges WER faces and exhibits greater consistency across writing systems. Our findings suggest that CER should be prioritized, or at least supplemented, in multilingual ASR evaluations to account for the varying linguistic characteristics of different languages.
arXiv Detail & Related papers (2024-10-09T19:57:07Z)
Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems [55.99999020778169]
We study a function that can predict the forthcoming words and estimate the time remaining until the end of an utterance. We develop a cross-attention-based algorithm that incorporates both acoustic and linguistic information. Results demonstrate the proposed model's ability to predict upcoming words and estimate future EOU events up to 300ms prior to the actual EOU.
arXiv Detail & Related papers (2024-09-30T06:29:58Z)
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection [49.27067541740956]
Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction. Building robust multilingual SER systems remains challenging due to the scarcity of labeled data in languages other than English and Chinese. We propose an approach to enhance SER performance in low SER resource languages by leveraging data from high-resource languages.
arXiv Detail & Related papers (2024-09-17T08:36:45Z)
Improving Domain-Specific ASR with LLM-Generated Contextual Descriptions [1.3791394805787949]
We propose a method to utilize the state-of-the-art Whisper without modifying its architecture. We also propose two additional training techniques to improve the domain specific ASR. Our experiments demonstrate that proposed methods notably enhance domain-specific ASR accuracy on real-life datasets.
arXiv Detail & Related papers (2024-07-25T08:44:04Z)
XLS-R Deep Learning Model for Multilingual ASR on Low- Resource Languages: Indonesian, Javanese, and Sundanese [0.0]
The study aims to improve ASR performance in converting spoken language into written text, specifically for Indonesian, Javanese, and Sundanese languages. The results show that the XLS-R 300m model achieves competitive Word Error Rate (WER) measurements, with a slight compromise in performance for Javanese and Sundanese languages.
arXiv Detail & Related papers (2024-01-12T13:44:48Z)
Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition [10.244515100904144]
In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset. We developed a 20k+ hours labeled Bangla speech dataset covering diverse topics, speaking styles, dialects, noisy environments, and conversational scenarios. We benchmarked the trained ASR with publicly available datasets and compared it with other available models. Our results demonstrate the efficacy of the model trained on psuedo-label data for the designed test-set along with publicly-available Bangla datasets.
arXiv Detail & Related papers (2023-11-06T15:37:14Z)
End-to-End Speech Recognition: A Survey [68.35707678386949]
The goal of this survey is to provide a taxonomy of E2E ASR models and corresponding improvements. All relevant aspects of E2E ASR are covered in this work, accompanied by discussions of performance and deployment opportunities.
arXiv Detail & Related papers (2023-03-03T01:46:41Z)
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition [98.70304981174748]
We focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models. We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.
arXiv Detail & Related papers (2021-10-09T15:06:09Z)
WER we are and WER we think we are [11.819335591315316]
We express skepticism towards the recent reports of very low Word Error Rates (WERs) achieved by modern Automatic Speech Recognition (ASR) systems on benchmark datasets. We compare three state-of-the-art commercial ASR systems on an internal dataset of real-life spontaneous human conversations and HUB'05 public benchmark. We formulate a set of guidelines which may aid in the creation of real-life, multi-domain datasets with high quality annotations for training and testing of robust ASR systems.
arXiv Detail & Related papers (2020-10-07T14:20:31Z)
DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation [64.44349061520671]
In this paper, we propose an ASR approach with efficient gradient-based architecture search, DARTS-ASR. In order to examine the generalizability of DARTS-ASR, we apply our approach not only on many languages to perform monolingual ASR, but also on a multilingual ASR setting.
arXiv Detail & Related papers (2020-05-13T11:32:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.