Related papers: Does Whisper understand Swiss German? An automatic, qualitative, and human evaluation

Does Whisper understand Swiss German? An automatic, qualitative, and human evaluation

URL: http://arxiv.org/abs/2404.19310v2
Date: Thu, 9 May 2024 11:54:04 GMT
Title: Does Whisper understand Swiss German? An automatic, qualitative, and human evaluation
Authors: Eyal Liron Dolev, Clemens Fidel Lutz, Noëmi Aepli,
Abstract summary: Whisper is a state-of-the-art automatic speech recognition (ASR) model. We evaluate Whisper's performance on Swiss German using automatic, qualitative, and human evaluation.
Score: 2.7036595757881323
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Whisper is a state-of-the-art automatic speech recognition (ASR) model (Radford et al., 2022). Although Swiss German dialects are allegedly not part of Whisper's training data, preliminary experiments showed that Whisper can transcribe Swiss German quite well, with the output being a speech translation into Standard German. To gain a better understanding of Whisper's performance on Swiss German, we systematically evaluate it using automatic, qualitative, and human evaluation. We test its performance on three existing test sets: SwissDial (Dogan-Sch\"onberger et al., 2021), STT4SG-350 (Pl\"uss et al., 2023), and Swiss Parliaments Corpus (Pl\"uss et al., 2021). In addition, we create a new test set for this work, based on short mock clinical interviews. For automatic evaluation, we used word error rate (WER) and BLEU. In the qualitative analysis, we discuss Whisper's strengths and weaknesses and anylyze some output examples. For the human evaluation, we conducted a survey with 28 participants who were asked to evaluate Whisper's performance. All of our evaluations suggest that Whisper is a viable ASR system for Swiss German, so long as the Standard German output is desired.

Related papers

The Box is in the Pen: Evaluating Commonsense Reasoning in Neural Machine Translation [59.06696045219381]
We present a test suite to evaluate the commonsense reasoning capability of neural machine translation. We manually create 1,200 triples, each of which contain a source sentence and two contrastive translations. Our experiments and analyses demonstrate that neural machine translation performs poorly on commonsense reasoning of the three ambiguity types.
arXiv Detail & Related papers (2025-03-05T09:41:03Z)
Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect [52.1701152610258]
Adding a Swiss German adapter to a modular encoder achieves 97.5% of fully monolithic adaptation performance. For the task of retrieving Swiss German sentences given Standard German queries, adapting a character-level model is more effective than the other adaptation strategies.
arXiv Detail & Related papers (2024-01-25T18:59:32Z)
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation [90.71078166159295]
We introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-text translation, and automatic speech recognition for up to 100 languages. We developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation.
arXiv Detail & Related papers (2023-08-22T17:44:18Z)
STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions [5.6787416472329495]
We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swiss German speech annotated with Standard German text at the sentence level. The data is collected using a web app in which the speakers are shown Standard German sentences, which they translate to Swiss German and record. It contains 343 hours of speech from all dialect regions and is the largest public speech corpus for Swiss German to date.
arXiv Detail & Related papers (2023-05-30T08:49:38Z)
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization [61.60501633397704]
We investigate the emergent abilities of the recently proposed web-scale speech model Whisper, by adapting it to unseen tasks with prompt engineering. We design task-specific prompts, by either leveraging another large-scale model, or simply manipulating the special tokens in the default prompts. Experiments show that our proposed prompts improve performance by 10% to 45% on the three zero-shot tasks, and even outperform SotA supervised models on some datasets.
arXiv Detail & Related papers (2023-05-18T16:32:58Z)
SwissBERT: The Multilingual Language Model for Switzerland [52.1701152610258]
SwissBERT is a masked language model created specifically for processing Switzerland-related text. SwissBERT is a pre-trained model that we adapted to news articles written in the national languages of Switzerland. Since SwissBERT uses language adapters, it may be extended to Swiss German dialects in future work.
arXiv Detail & Related papers (2023-03-23T14:44:47Z)
2nd Swiss German Speech to Standard German Text Shared Task at SwissText 2022 [3.910747992453137]
The objective was to maximize the BLEU score on a test set of Grisons speech. 3 teams participated, with the best-performing system achieving a BLEU score of 70.1.
arXiv Detail & Related papers (2023-01-17T10:31:11Z)
BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric [66.73705349465207]
End-to-end speech-to-speech translation (S2ST) is generally evaluated with text-based metrics. We propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems.
arXiv Detail & Related papers (2022-12-16T14:00:26Z)
SDS-200: A Swiss German Speech to Standard German Text Corpus [5.370317759946287]
We present SDS-200, a corpus of Swiss German dialectal speech with Standard German text translations. The data was collected using a web recording tool that is open to the public. The data consists of 200 hours of speech by around 4000 different speakers and covers a large part of the Swiss-German dialect landscape.
arXiv Detail & Related papers (2022-05-19T12:16:29Z)
Dialectal Speech Recognition and Translation of Swiss German Speech to Standard German Text: Microsoft's Submission to SwissText 2021 [17.675379299410054]
Swiss German refers to the multitude of Alemannic dialects spoken in the German-speaking parts of Switzerland. We propose a hybrid automatic speech recognition system with a lexicon that incorporates translations. Our submission reaches 46.04% BLEU on a blind conversational test set and outperforms the second best competitor by a 12% relative margin.
arXiv Detail & Related papers (2021-06-15T13:34:02Z)
The LMU Munich System for the WMT 2020 Unsupervised Machine Translation Shared Task [125.06737861979299]
This paper describes the submission of LMU Munich to the WMT 2020 unsupervised shared task, in two language directions. Our core unsupervised neural machine translation (UNMT) system follows the strategy of Chronopoulou et al. We ensemble our best-performing systems and reach a BLEU score of 32.4 on German->Upper Sorbian and 35.2 on Upper Sorbian->German.
arXiv Detail & Related papers (2020-10-25T19:04:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.