Towards Learning a Universal Non-Semantic Representation of Speech
- URL: http://arxiv.org/abs/2002.12764v6
- Date: Thu, 6 Aug 2020 04:53:37 GMT
- Title: Towards Learning a Universal Non-Semantic Representation of Speech
- Authors: Joel Shor, Aren Jansen, Ronnie Maor, Oran Lang, Omry Tuval, Felix de
Chaumont Quitry, Marco Tagliasacchi, Ira Shavitt, Dotan Emanuel, Yinnon Haviv
- Abstract summary: This paper proposes a benchmark for comparing speech representations on non-semantic tasks, and proposes a representation based on an unsupervised triplet-loss objective.
The proposed representation outperforms other representations on the benchmark, and even exceeds state-of-the-art performance on a number of transfer learning tasks.
- Score: 18.54874934311111
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ultimate goal of transfer learning is to reduce labeled data requirements
by exploiting a pre-existing embedding model trained for different datasets or
tasks. The visual and language communities have established benchmarks to
compare embeddings, but the speech community has yet to do so. This paper
proposes a benchmark for comparing speech representations on non-semantic
tasks, and proposes a representation based on an unsupervised triplet-loss
objective. The proposed representation outperforms other representations on the
benchmark, and even exceeds state-of-the-art performance on a number of
transfer learning tasks. The embedding is trained on a publicly available
dataset, and it is tested on a variety of low-resource downstream tasks,
including personalization tasks and medical domain. The benchmark, models, and
evaluation code are publicly released.
Related papers
- SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Sentence Representation Learning with Generative Objective rather than
Contrastive Objective [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction.
Our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-10-16T07:47:46Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented
Visual Models [102.63817106363597]
We build ELEVATER, the first benchmark to compare and evaluate pre-trained language-augmented visual models.
It consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge.
We will release our toolkit and evaluation platforms for the research community.
arXiv Detail & Related papers (2022-04-19T10:23:42Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation
on Natural Speech [44.68649535280397]
We propose a suite of benchmark tasks for Spoken Language Understanding Evaluation (SLUE)
SLUE consists of limited-size labeled training sets and corresponding evaluation sets.
We present the first phase of the SLUE benchmark suite, consisting of named entity recognition, sentiment analysis, and ASR on the corresponding datasets.
We provide new transcriptions and annotations on subsets of the VoxCeleb and VoxPopuli datasets, evaluation metrics and results for baseline models, and an open-source toolkit to reproduce the baselines and evaluate new models.
arXiv Detail & Related papers (2021-11-19T18:59:23Z) - An Exploration of Self-Supervised Pretrained Representations for
End-to-End Speech Recognition [98.70304981174748]
We focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models.
We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.
arXiv Detail & Related papers (2021-10-09T15:06:09Z) - Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers [0.05857406612420462]
Large-scale, pre-trained language models achieve human-level and superhuman accuracy on existing language understanding tasks.
We propose evaluating systems through a novel measure of prediction coherence.
arXiv Detail & Related papers (2021-09-10T15:04:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.