Crowdsourced Multilingual Speech Intelligibility Testing
- URL: http://arxiv.org/abs/2403.14817v1
- Date: Thu, 21 Mar 2024 20:14:53 GMT
- Title: Crowdsourced Multilingual Speech Intelligibility Testing
- Authors: Laura Lechler, Kamil Wojcicki,
- Abstract summary: We propose an approach for a crowdsourced intelligibility assessment. Standards and recommendations are yet to be defined.
We detail the test design, the collection and public release of the multilingual speech data, and the results of our early experiments.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the advent of generative audio features, there is an increasing need for rapid evaluation of their impact on speech intelligibility. Beyond the existing laboratory measures, which are expensive and do not scale well, there has been comparatively little work on crowdsourced assessment of intelligibility. Standards and recommendations are yet to be defined, and publicly available multilingual test materials are lacking. In response to this challenge, we propose an approach for a crowdsourced intelligibility assessment. We detail the test design, the collection and public release of the multilingual speech data, and the results of our early experiments.
Related papers
- CogBench: A Large Language Model Benchmark for Multilingual Speech-Based Cognitive Impairment Assessment [13.74065648648307]
We propose CogBench, the first benchmark designed to evaluate the cross-lingual and cross-site generalizability of large language models for speech-based cognitive impairment assessment.<n>Our results show that conventional deep learning models degrade substantially when transferred across domains.<n>Our findings offer a critical step toward building clinically useful and linguistically robust speech-based cognitive assessment tools.
arXiv Detail & Related papers (2025-08-05T12:06:16Z) - QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions [45.34333059156364]
We introduce QualiSpeech, a comprehensive low-level speech quality assessment dataset.
We also propose the QualiSpeech Benchmark to evaluate the low-level speech understanding capabilities of auditory large language models.
arXiv Detail & Related papers (2025-03-26T07:32:20Z) - Investigating Language-Specific Calibration For Pruning Multilingual Large Language Models [11.421452042888523]
We compare different calibration languages for pruning multilingual models across diverse languages, tasks, models, and SotA pruning techniques.
Our results offer practical suggestions, for example, calibrating in the target language can efficiently retain the language modeling capability but does not necessarily benefit downstream tasks.
arXiv Detail & Related papers (2024-08-26T16:29:13Z) - INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages [25.402797722575805]
Indic QA Benchmark is a dataset for context grounded question answering in 11 major Indian languages.
Evaluations revealed weak performance in low resource languages due to a strong English language bias in their training data.
We also investigated the Translate Test paradigm,where inputs are translated to English for processing and the results are translated back into the source language for output.
arXiv Detail & Related papers (2024-07-18T13:57:16Z) - Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials [4.231937382464348]
In clinical trials, patients are assessed based on their speech data to detect and monitor cognitive and mental health disorders.
We propose using these speech recordings to verify the identities of enrolled patients and identify and exclude the individuals who try to enroll multiple times in the same trial.
We evaluate pre-trained TitaNet, ECAPA-TDNN, and SpeakerNet models by enrolling and testing with speech-impaired patients speaking English, German, Danish, Spanish, and Arabic languages.
arXiv Detail & Related papers (2024-04-02T14:19:30Z) - Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in
Multilingual Language Models [12.662039551306632]
We show that observed high performance of multilingual models can be largely attributed to factors not requiring the transfer of actual linguistic knowledge.
More specifically, we observe what has been transferred across languages is mostly data artifacts and biases, especially for low-resource languages.
arXiv Detail & Related papers (2024-02-03T09:41:52Z) - Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation
over More Languages and Beyond [89.54151859266202]
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework.
The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages.
The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks.
arXiv Detail & Related papers (2023-10-09T08:30:01Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - On Evaluating and Mitigating Gender Biases in Multilingual Settings [5.248564173595024]
We investigate some of the challenges with evaluating and mitigating biases in multilingual settings.
We first create a benchmark for evaluating gender biases in pre-trained masked language models.
We extend various debiasing methods to work beyond English and evaluate their effectiveness for SOTA massively multilingual models.
arXiv Detail & Related papers (2023-07-04T06:23:04Z) - A Corpus for Sentence-level Subjectivity Detection on English News Articles [49.49218203204942]
We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics.
Our corpus paves the way for subjectivity detection in English without relying on language-specific tools, such as lexicons or machine translation.
arXiv Detail & Related papers (2023-05-29T11:54:50Z) - Revisiting non-English Text Simplification: A Unified Multilingual
Benchmark [14.891068432456262]
This paper introduces the MultiSim benchmark, a collection of 27 resources in 12 distinct languages containing over 1.7 million complex-simple sentence pairs.
Our experiments using MultiSim with pre-trained multilingual language models reveal exciting performance improvements from multilingual training in non-English settings.
arXiv Detail & Related papers (2023-05-25T03:03:29Z) - Delving Deeper into Cross-lingual Visual Question Answering [115.16614806717341]
We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance.
We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers.
arXiv Detail & Related papers (2022-02-15T18:22:18Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.