The Zero Resource Speech Challenge 2020: Discovering discrete subword
and word units
- URL: http://arxiv.org/abs/2010.05967v1
- Date: Mon, 12 Oct 2020 18:56:48 GMT
- Title: The Zero Resource Speech Challenge 2020: Discovering discrete subword
and word units
- Authors: Ewan Dunbar and Julien Karadayi and Mathieu Bernard and Xuan-Nga Cao
and Robin Algayres and Lucas Ondel and Laurent Besacier and Sakriani Sakti
and Emmanuel Dupoux
- Abstract summary: Zero Resource Speech Challenge 2020 aims at learning speech representations from raw audio signals without any labels.
We present the results of the twenty submitted models and discuss the implications of the main findings for unsupervised speech learning.
- Score: 40.41406551797358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the Zero Resource Speech Challenge 2020, which aims at learning
speech representations from raw audio signals without any labels. It combines
the data sets and metrics from two previous benchmarks (2017 and 2019) and
features two tasks which tap into two levels of speech representation. The
first task is to discover low bit-rate subword representations that optimize
the quality of speech synthesis; the second one is to discover word-like units
from unsegmented raw speech. We present the results of the twenty submitted
models and discuss the implications of the main findings for unsupervised
speech learning.
Related papers
- Exploring Speech Recognition, Translation, and Understanding with
Discrete Speech Units: A Comparative Study [68.88536866933038]
Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies.
Recent investigations proposed the use of discrete speech units derived from self-supervised learning representations.
Applying various methods, such as de-duplication and subword modeling, can further compress the speech sequence length.
arXiv Detail & Related papers (2023-09-27T17:21:13Z) - Representation Learning With Hidden Unit Clustering For Low Resource
Speech Applications [37.89857769906568]
We describe an approach to self-supervised representation learning from raw audio using a hidden unit clustering (HUC) framework.
The input to the model consists of audio samples that are windowed and processed with 1-D convolutional layers.
The HUC framework, allowing the categorization of the representations into a small number of phoneme-like units, is used to train the model for learning semantically rich speech representations.
arXiv Detail & Related papers (2023-07-14T13:02:10Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - The Ability of Self-Supervised Speech Models for Audio Representations [53.19715501273934]
Self-supervised learning (SSL) speech models have achieved unprecedented success in speech representation learning.
We conduct extensive experiments on abundant speech and non-speech audio datasets to evaluate the representation ability of state-of-the-art SSL speech models.
Results show that SSL speech models could extract meaningful features of a wide range of non-speech audio, while they may also fail on certain types of datasets.
arXiv Detail & Related papers (2022-09-26T15:21:06Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Direct speech-to-speech translation with discrete units [64.19830539866072]
We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation.
We propose to predict the self-supervised discrete representations learned from an unlabeled speech corpus instead.
When target text transcripts are available, we design a multitask learning framework with joint speech and text training that enables the model to generate dual mode output (speech and text) simultaneously in the same inference pass.
arXiv Detail & Related papers (2021-07-12T17:40:43Z) - The Interspeech Zero Resource Speech Challenge 2021: Spoken language
modelling [19.525392906001624]
We present the Zero Resource Speech Challenge 2021, which asks participants to learn a language model directly from audio, without any text or labels.
The challenge is based on the Libri-light dataset, which provides up to 60k hours of audio from English audio books without any associated text.
arXiv Detail & Related papers (2021-04-29T23:53:37Z) - Generative Spoken Language Modeling from Raw Audio [42.153136032037175]
Generative spoken language modeling involves learning jointly the acoustic and linguistic characteristics of a language from raw audio only (without text or labels)
We introduce metrics to automatically evaluate the generated output in terms of acoustic and linguistic quality in two associated end-to-end tasks.
We test baseline systems consisting of a discrete speech encoder (returning discrete, low, pseudo-text units), a generative language model (trained on pseudo-text units) and a speech decoder.
arXiv Detail & Related papers (2021-02-01T21:41:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.