ProsAudit, a prosodic benchmark for self-supervised speech models
- URL: http://arxiv.org/abs/2302.12057v3
- Date: Thu, 1 Jun 2023 08:11:15 GMT
- Title: ProsAudit, a prosodic benchmark for self-supervised speech models
- Authors: Maureen de Seyssel, Marvin Lavechin, Hadrien Titeux, Arthur Thomas,
Gwendal Virlet, Andrea Santos Revilla, Guillaume Wisniewski, Bogdan Ludusan,
Emmanuel Dupoux
- Abstract summary: ProsAudit is a benchmark to assess structural prosodic knowledge in self-supervised learning (SSL) speech models.
It consists of two subtasks, their corresponding metrics, and an evaluation dataset.
- Score: 14.198508548718676
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present ProsAudit, a benchmark in English to assess structural prosodic
knowledge in self-supervised learning (SSL) speech models. It consists of two
subtasks, their corresponding metrics, and an evaluation dataset. In the
protosyntax task, the model must correctly identify strong versus weak prosodic
boundaries. In the lexical task, the model needs to correctly distinguish
between pauses inserted between words and within words. We also provide human
evaluation scores on this benchmark. We evaluated a series of SSL models and
found that they were all able to perform above chance on both tasks, even when
evaluated on an unseen language. However, non-native models performed
significantly worse than native ones on the lexical task, highlighting the
importance of lexical knowledge in this task. We also found a clear effect of
size with models trained on more data performing better in the two subtasks.
Related papers
- Teaching a Language Model to Distinguish Between Similar Details using a Small Adversarial Training Set [0.0]
We show an increase in accuracy on the adversarial test set (+ 13%) while still maintaining good performance on the original NLI task.
We also show an increase in accuracy from 91.2% to 92.9% on the most similar contradictions in the SNLI test set (as judged by cosine similarity)
arXiv Detail & Related papers (2024-10-30T15:27:55Z) - Integrating Self-supervised Speech Model with Pseudo Word-level Targets
from Visually-grounded Speech Model [57.78191634042409]
We propose Pseudo-Word HuBERT (PW-HuBERT), a framework that integrates pseudo word-level targets into the training process.
Our experimental results on four spoken language understanding (SLU) benchmarks suggest the superiority of our model in capturing semantic information.
arXiv Detail & Related papers (2024-02-08T16:55:21Z) - Mispronunciation detection using self-supervised speech representations [10.010024759851142]
We study the use of SSL models for the task of mispronunciation detection for second language learners.
We compare two downstream approaches: 1) training the model for phone recognition using native English data, and 2) training a model directly for the target task using non-native English data.
arXiv Detail & Related papers (2023-07-30T21:20:58Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - A Closer Look at Linguistic Knowledge in Masked Language Models: The
Case of Relative Clauses in American English [17.993417004424078]
Transformer-based language models achieve high performance on various tasks, but we still lack understanding of the kind of linguistic knowledge they learn and rely on.
We evaluate three models (BERT, RoBERTa, and ALBERT) testing their grammatical and semantic knowledge by sentence-level probing, diagnostic cases, and masked prediction tasks.
arXiv Detail & Related papers (2020-11-02T13:25:39Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - Labeling Explicit Discourse Relations using Pre-trained Language Models [0.0]
State-of-the-art models achieve slightly above 45% of F-score by using hand-crafted features.
We find that the pre-trained language models, when finetuned, are powerful enough to replace the linguistic features.
This is the first time when a model outperforms the knowledge intensive models without employing any linguistic features.
arXiv Detail & Related papers (2020-06-21T17:18:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.