Semantic enrichment towards efficient speech representations
- URL: http://arxiv.org/abs/2307.01323v1
- Date: Mon, 3 Jul 2023 19:52:56 GMT
- Title: Semantic enrichment towards efficient speech representations
- Authors: Ga\"elle Laperri\`ere, Ha Nguyen, Sahar Ghannay, Bassam Jabaian,
Yannick Est\`eve
- Abstract summary: This study investigates a specific in-domain semantic enrichment of the SAMU-XLSR model.
We show the benefits of the use of same-domain French and Italian benchmarks for low-resource language portability.
- Score: 9.30840529284715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over the past few years, self-supervised learned speech representations have
emerged as fruitful replacements for conventional surface representations when
solving Spoken Language Understanding (SLU) tasks. Simultaneously, multilingual
models trained on massive textual data were introduced to encode language
agnostic semantics. Recently, the SAMU-XLSR approach introduced a way to make
profit from such textual models to enrich multilingual speech representations
with language agnostic semantics. By aiming for better semantic extraction on a
challenging Spoken Language Understanding task and in consideration with
computation costs, this study investigates a specific in-domain semantic
enrichment of the SAMU-XLSR model by specializing it on a small amount of
transcribed data from the downstream task. In addition, we show the benefits of
the use of same-domain French and Italian benchmarks for low-resource language
portability and explore cross-domain capacities of the enriched SAMU-XLSR.
Related papers
- A dual task learning approach to fine-tune a multilingual semantic speech encoder for Spoken Language Understanding [12.887586659035497]
Self-Supervised Learning is vastly used to efficiently represent speech for Spoken Language Understanding.
textual SSL models are proposed to encode language-agnostic semantics.
SAMU-XLSR framework employed this semantic information to enrich multilingual speech representations.
arXiv Detail & Related papers (2024-06-17T23:07:53Z) - MINERS: Multilingual Language Models as Semantic Retrievers [23.686762008696547]
This paper introduces the MINERS, a benchmark designed to evaluate the ability of multilingual language models in semantic retrieval tasks.
We create a comprehensive framework to assess the robustness of LMs in retrieving samples across over 200 diverse languages.
Our results demonstrate that by solely retrieving semantically similar embeddings yields performance competitive with state-of-the-art approaches.
arXiv Detail & Related papers (2024-06-11T16:26:18Z) - Label Aware Speech Representation Learning For Language Identification [49.197215416945596]
We propose a novel framework of combining self-supervised representation learning with the language label information for the pre-training task.
This framework, termed as Label Aware Speech Representation (LASR) learning, uses a triplet based objective function to incorporate language labels along with the self-supervised loss function.
arXiv Detail & Related papers (2023-06-07T12:14:16Z) - The Interpreter Understands Your Meaning: End-to-end Spoken Language
Understanding Aided by Speech Translation [13.352795145385645]
Speech translation (ST) is a good means of pretraining speech models for end-to-end spoken language understanding.
We show that our models reach higher performance over baselines on monolingual and multilingual intent classification.
We also create new benchmark datasets for speech summarization and low-resource/zero-shot transfer from English to French or Spanish.
arXiv Detail & Related papers (2023-05-16T17:53:03Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Multilingual Word Sense Disambiguation with Unified Sense Representation [55.3061179361177]
We propose building knowledge and supervised-based Multilingual Word Sense Disambiguation (MWSD) systems.
We build unified sense representations for multiple languages and address the annotation scarcity problem for MWSD by transferring annotations from rich-sourced languages to poorer ones.
Evaluations of SemEval-13 and SemEval-15 datasets demonstrate the effectiveness of our methodology.
arXiv Detail & Related papers (2022-10-14T01:24:03Z) - Transducer-based language embedding for spoken language identification [38.60303603000269]
The acoustic and linguistic features are important cues for the spoken language identification task.
Recent advanced LID systems mainly use acoustic features that lack the usage of explicit linguistic feature encoding.
We propose a novel transducer-based language embedding approach for LID tasks by integrating an RNN transducer model into a language embedding framework.
arXiv Detail & Related papers (2022-04-08T07:23:43Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z) - SPLAT: Speech-Language Joint Pre-Training for Spoken Language
Understanding [61.02342238771685]
Spoken language understanding requires a model to analyze input acoustic signal to understand its linguistic content and make predictions.
Various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text.
We propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules.
arXiv Detail & Related papers (2020-10-05T19:29:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.