BabySLM: language-acquisition-friendly benchmark of self-supervised
spoken language models
- URL: http://arxiv.org/abs/2306.01506v2
- Date: Thu, 8 Jun 2023 12:22:30 GMT
- Title: BabySLM: language-acquisition-friendly benchmark of self-supervised
spoken language models
- Authors: Marvin Lavechin and Yaya Sy and Hadrien Titeux and Mar\'ia Andrea Cruz
Bland\'on and Okko R\"as\"anen and Herv\'e Bredin and Emmanuel Dupoux and
Alejandrina Cristia
- Abstract summary: Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels.
We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels.
We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
- Score: 56.93604813379634
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Self-supervised techniques for learning speech representations have been
shown to develop linguistic competence from exposure to speech without the need
for human labels. In order to fully realize the potential of these approaches
and further our understanding of how infants learn language, simulations must
closely emulate real-life situations by training on developmentally plausible
corpora and benchmarking against appropriate test sets. To this end, we propose
a language-acquisition-friendly benchmark to probe spoken language models at
the lexical and syntactic levels, both of which are compatible with the
vocabulary typical of children's language experiences. This paper introduces
the benchmark and summarizes a range of experiments showing its usefulness. In
addition, we highlight two exciting challenges that need to be addressed for
further progress: bridging the gap between text and speech and between clean
speech and in-the-wild speech.
Related papers
- DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment [82.86363991170546]
We propose a Descriptive Speech-Text Alignment approach that leverages speech captioning to bridge the gap between speech and text modalities.
Our model demonstrates superior performance on the Dynamic-SUPERB benchmark, particularly in generalizing to unseen tasks.
These findings highlight the potential to reshape instruction-following SLMs by incorporating descriptive rich, speech captions.
arXiv Detail & Related papers (2024-06-27T03:52:35Z) - Saving the legacy of Hero Ibash: Evaluating Four Language Models for
Aminoacian [0.8158530638728501]
This study assesses four cutting-edge language models in the underexplored Aminoacian language.
It scrutinizes their adaptability, effectiveness, and limitations in text generation, semantic coherence, and contextual understanding.
arXiv Detail & Related papers (2024-02-28T07:22:13Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language
Representations [51.08119762844217]
SenteCon is a method for introducing human interpretability in deep language representations.
We show that SenteCon provides high-level interpretability at little to no cost to predictive performance on downstream tasks.
arXiv Detail & Related papers (2023-05-24T05:06:28Z) - DALL-E 2 Fails to Reliably Capture Common Syntactic Processes [0.0]
We analyze the ability of DALL-E 2 to capture 8 grammatical phenomena pertaining to compositionality.
We show that DALL-E 2 is unable to reliably infer meanings that are consistent with the syntax.
arXiv Detail & Related papers (2022-10-23T23:56:54Z) - Language Acquisition is Embodied, Interactive, Emotive: a Research
Proposal [2.639737913330821]
We review the literature on the role of embodiment and emotion in the interactive setting of spoken dialogue as necessary prerequisites for language learning for human children.
We sketch a model of semantics that leverages current transformer-based models and a word-level grounded model, then explain the robot-dialogue system that will make use of our semantic model.
arXiv Detail & Related papers (2021-05-10T19:40:17Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - The Rediscovery Hypothesis: Language Models Need to Meet Linguistics [8.293055016429863]
We study whether linguistic knowledge is a necessary condition for good performance of modern language models.
We show that language models that are significantly compressed but perform well on their pretraining objectives retain good scores when probed for linguistic structures.
This result supports the rediscovery hypothesis and leads to the second contribution of our paper: an information-theoretic framework that relates language modeling objective with linguistic information.
arXiv Detail & Related papers (2021-03-02T15:57:39Z) - Evaluating Models of Robust Word Recognition with Serial Reproduction [8.17947290421835]
We compare several broad-coverage probabilistic generative language models in their ability to capture human linguistic expectations.
We find that those models that make use of abstract representations of preceding linguistic context best predict the changes made by people in the course of serial reproduction.
arXiv Detail & Related papers (2021-01-24T20:16:12Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.