PIER: A Novel Metric for Evaluating What Matters in Code-Switching
- URL: http://arxiv.org/abs/2501.09512v2
- Date: Tue, 21 Jan 2025 10:39:27 GMT
- Title: PIER: A Novel Metric for Evaluating What Matters in Code-Switching
- Authors: Enes Yavuz Ugan, Ngoc-Quan Pham, Leonard Bärmann, Alex Waibel,
- Abstract summary: Code-switching is a significant challenge for Automatic Speech Recognition.
General metrics such as Word-Error-Rate (WER) are commonly used to measure performance.
We propose Point-of-Interest Error Rate (PIER), a variant of WER that focuses only on specific words of interest.
- Score: 15.370845263369347
- License:
- Abstract: Code-switching, the alternation of languages within a single discourse, presents a significant challenge for Automatic Speech Recognition. Despite the unique nature of the task, performance is commonly measured with established metrics such as Word-Error-Rate (WER). However, in this paper, we question whether these general metrics accurately assess performance on code-switching. Specifically, using both Connectionist-Temporal-Classification and Encoder-Decoder models, we show fine-tuning on non-code-switched data from both matrix and embedded language improves classical metrics on code-switching test sets, although actual code-switched words worsen (as expected). Therefore, we propose Point-of-Interest Error Rate (PIER), a variant of WER that focuses only on specific words of interest. We instantiate PIER on code-switched utterances and show that this more accurately describes the code-switching performance, showing huge room for improvement in future work. This focused evaluation allows for a more precise assessment of model performance, particularly in challenging aspects such as inter-word and intra-word code-switching.
Related papers
- Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding [27.499426765845705]
Code-switching automatic speech recognition (ASR) faces challenges due to the language confusion resulting from accents, auditory similarity, and seamless language switches.
We adapt Whisper, which is a large-scale multilingual pre-trained speech recognition model, to CS from both encoder and decoder parts.
arXiv Detail & Related papers (2024-12-21T07:06:44Z) - Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis.
Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z) - CodeScore-R: An Automated Robustness Metric for Assessing the FunctionalCorrectness of Code Synthesis [17.747095451792084]
We propose an automated robust metric, called CodeScore-R, for evaluating the functionality of code synthesis.
In the tasks of code generation and migration in Java and Python, CodeScore-R outperforms other metrics.
arXiv Detail & Related papers (2024-06-11T02:51:17Z) - Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages [49.6922490267701]
We introduce a new zero resource code-switched speech benchmark designed to assess the code-switching capabilities of self-supervised speech encoders.
We showcase a baseline system of language modeling on discrete units to demonstrate how the code-switching abilities of speech encoders can be assessed.
arXiv Detail & Related papers (2023-10-04T17:58:11Z) - Benchmarking Evaluation Metrics for Code-Switching Automatic Speech
Recognition [19.763431520942028]
We develop a benchmark data set of code-switching speech recognition hypotheses with human judgments.
We define clear guidelines for minimal editing of automatic hypotheses.
We release the first corpus for human acceptance of code-switching speech recognition results in dialectal Arabic/English conversation speech.
arXiv Detail & Related papers (2022-11-22T08:14:07Z) - Optimizing Bilingual Neural Transducer with Synthetic Code-switching
Text Generation [10.650573361117669]
Semi-supervised training and synthetic code-switched data can improve the bilingual ASR system on code-switching speech.
Our final system achieves 25% mixed error rate (MER) on the ASCEND English/Mandarin code-switching test set.
arXiv Detail & Related papers (2022-10-21T19:42:41Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - Deep Just-In-Time Inconsistency Detection Between Comments and Source
Code [51.00904399653609]
In this paper, we aim to detect whether a comment becomes inconsistent as a result of changes to the corresponding body of code.
We develop a deep-learning approach that learns to correlate a comment with code changes.
We show the usefulness of our approach by combining it with a comment update model to build a more comprehensive automatic comment maintenance system.
arXiv Detail & Related papers (2020-10-04T16:49:28Z) - CodeBLEU: a Method for Automatic Evaluation of Code Synthesis [57.87741831987889]
In the area of code synthesis, the commonly used evaluation metric is BLEU or perfect accuracy.
We introduce a new automatic evaluation metric, dubbed CodeBLEU.
It absorbs the strength of BLEU in the n-gram match and further injects code syntax via abstract syntax trees (AST) and code semantics via data-flow.
arXiv Detail & Related papers (2020-09-22T03:10:49Z) - LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation [13.947879344871442]
We propose a benchmark for Linguistic Code-switching Evaluation (LinCE)
LinCE combines ten corpora covering four different code-switched language pairs.
We provide the scores of different popular models, including LSTM, ELMo, and multilingual BERT.
arXiv Detail & Related papers (2020-05-09T00:00:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.