The Faetar Benchmark: Speech Recognition in a Very Under-Resourced   Language
        - URL: http://arxiv.org/abs/2409.08103v3
- Date: Tue, 07 Jan 2025 15:32:33 GMT
- Title: The Faetar Benchmark: Speech Recognition in a Very Under-Resourced   Language
- Authors: Michael Ong, Sean Robertson, Leo Peckham, Alba Jorquera Jimenez de Aberasturi, Paula Arkhangorodsky, Robin Huo, Aman Sakhardande, Mark Hallap, Naomi Nagy, Ewan Dunbar, 
- Abstract summary: Faetar has no standard orthography, has virtually no existing textual or speech resources other than what is included in the benchmark.<n>The corpus comes from field recordings, most of which are noisy, for which only 5 hrs have matching transcriptions.<n>We report baseline results from state-of-the-art multilingual speech foundation models with a best phone error rate of 30.4%.
- Score: 4.077418516695122
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   We introduce the Faetar Automatic Speech Recognition Benchmark, a benchmark corpus designed to push the limits of current approaches to low-resource speech recognition. Faetar, a Franco-Proven\c{c}al variety spoken primarily in Italy, has no standard orthography, has virtually no existing textual or speech resources other than what is included in the benchmark, and is quite different from other forms of Franco-Proven\c{c}al. The corpus comes from field recordings, most of which are noisy, for which only 5 hrs have matching transcriptions, and for which forced alignment is of variable quality. The corpus contains an additional 20 hrs of unlabelled speech. We report baseline results from state-of-the-art multilingual speech foundation models with a best phone error rate of 30.4%, using a pipeline that continues pre-training on the foundation model using the unlabelled set. 
 
      
        Related papers
        - SpeechR: A Benchmark for Speech Reasoning in Large Audio-Language Models [60.72029578488467]
 SpeechR is a unified benchmark for evaluating reasoning over speech in large audio-language models.<n>It evaluates models along three key dimensions: factual retrieval, procedural inference, and normative judgment.<n> Evaluations on eleven state-of-the-art LALMs reveal that high transcription accuracy does not translate into strong reasoning capabilities.
 arXiv  Detail & Related papers  (2025-08-04T03:28:04Z)
- Languages in Multilingual Speech Foundation Models Align Both   Phonetically and Semantically [58.019484208091534]
 Cross-lingual alignment in pretrained language models (LMs) has enabled efficient transfer in text-based LMs.<n>It remains an open question whether findings and methods from text-based cross-lingual alignment apply to speech.
 arXiv  Detail & Related papers  (2025-05-26T07:21:20Z)
- DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and   Closely-Related Languages [49.38663048447942]
 We propose DIALECTBENCH, the first-ever large-scale benchmark for NLP on varieties.
This allows for a comprehensive evaluation of NLP system performance on different language varieties.
We provide substantial evidence of performance disparities between standard and non-standard language varieties.
 arXiv  Detail & Related papers  (2024-03-16T20:18:36Z)
- Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
  Modelling [70.23876429382969]
 We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
 arXiv  Detail & Related papers  (2023-07-16T15:18:25Z)
- Controllable Emphasis with zero data for text-to-speech [57.12383531339368]
 A simple but effective method to achieve emphasized speech consists in increasing the predicted duration of the emphasised word.
We show that this is significantly better than spectrogram modification techniques improving naturalness by $7.3%$ and correct testers' identification of the emphasised word in a sentence by $40%$ on a reference female en-US voice.
 arXiv  Detail & Related papers  (2023-07-13T21:06:23Z)
- T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
  Classification [50.675552118811]
 Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
 arXiv  Detail & Related papers  (2023-06-08T07:33:22Z)
- Evaluating context-invariance in unsupervised speech representations [15.67794428589585]
 Current benchmarks do not measure context-invariance.
We develop a new version of the ZeroSpeech ABX benchmark that measures context-invariance.
We demonstrate that the context-independence of representations is predictive of the stability of word-level representations.
 arXiv  Detail & Related papers  (2022-10-27T21:15:49Z)
- Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
 Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
 arXiv  Detail & Related papers  (2022-05-21T16:52:57Z)
- Lahjoita puhetta -- a large-scale corpus of spoken Finnish with some
  benchmarks [9.160401226886947]
 The Donate Speech campaign has so far succeeded in gathering approximately 3600 hours of ordinary, colloquial Finnish speech.
The primary goals of the collection were to create a representative, large-scale resource to study spontaneous spoken Finnish and to accelerate the development of language technology and speech-based services.
We present the collection process and the collected corpus, and showcase its versatility through multiple use cases.
 arXiv  Detail & Related papers  (2022-03-24T07:50:25Z)
- Towards Language Modelling in the Speech Domain Using Sub-word
  Linguistic Units [56.52704348773307]
 We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
 arXiv  Detail & Related papers  (2021-10-31T22:48:30Z)
- WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech
  Recognition [25.31180901037065]
 WenetSpeech is a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech.
We collect the data from YouTube and Podcast, which covers a variety of speaking styles, scenarios, domains, topics, and noisy conditions.
 arXiv  Detail & Related papers  (2021-10-07T12:05:29Z)
- English Accent Accuracy Analysis in a State-of-the-Art Automatic Speech
  Recognition System [3.4888132404740797]
 We evaluate a state-of-the-art automatic speech recognition model, using unseen data from a corpus with a wide variety of labeled English accents.
We show that there is indeed an accuracy bias in terms of accentual variety, favoring the accents most prevalent in the training corpus.
 arXiv  Detail & Related papers  (2021-05-09T08:24:33Z)
- FT Speech: Danish Parliament Speech Corpus [21.190182627955817]
 This paper introduces FT Speech, a new speech corpus created from the recorded meetings of the Danish Parliament.
The corpus contains over 1,800 hours of transcribed speech by a total of 434 speakers.
It is significantly larger in duration, vocabulary, and amount of spontaneous speech than the existing public speech corpora for Danish.
 arXiv  Detail & Related papers  (2020-05-25T19:51:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.