Bidirectional Representations for Low Resource Spoken Language
Understanding
- URL: http://arxiv.org/abs/2211.14320v2
- Date: Sat, 14 Oct 2023 13:18:26 GMT
- Title: Bidirectional Representations for Low Resource Spoken Language
Understanding
- Authors: Quentin Meeus, Marie-Francine Moens, Hugo Van hamme
- Abstract summary: We propose a representation model to encode speech in bidirectional rich encodings.
The approach uses a masked language modelling objective to learn the representations.
We show that the performance of the resulting encodings is better than comparable models on multiple datasets.
- Score: 39.208462511430554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most spoken language understanding systems use a pipeline approach composed
of an automatic speech recognition interface and a natural language
understanding module. This approach forces hard decisions when converting
continuous inputs into discrete language symbols. Instead, we propose a
representation model to encode speech in rich bidirectional encodings that can
be used for downstream tasks such as intent prediction. The approach uses a
masked language modelling objective to learn the representations, and thus
benefits from both the left and right contexts. We show that the performance of
the resulting encodings before fine-tuning is better than comparable models on
multiple datasets, and that fine-tuning the top layers of the representation
model improves the current state of the art on the Fluent Speech Command
dataset, also in a low-data regime, when a limited amount of labelled data is
used for training. Furthermore, we propose class attention as a spoken language
understanding module, efficient both in terms of speed and number of
parameters. Class attention can be used to visually explain the predictions of
our model, which goes a long way in understanding how the model makes
predictions. We perform experiments in English and in Dutch.
Related papers
- Learning Semantic Information from Raw Audio Signal Using Both
Contextual and Phonetic Representations [18.251845041785906]
We propose a framework to learn semantics from raw audio signals using two types of representations.
We introduce a speech-to-unit processing pipeline that captures two types of representations with different time resolutions.
For the language model, we adopt a dual-channel architecture to incorporate both types of representation.
arXiv Detail & Related papers (2024-02-02T10:39:58Z) - Multilingual self-supervised speech representations improve the speech
recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%.
In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z) - The Interpreter Understands Your Meaning: End-to-end Spoken Language
Understanding Aided by Speech Translation [13.352795145385645]
Speech translation (ST) is a good means of pretraining speech models for end-to-end spoken language understanding.
We show that our models reach higher performance over baselines on monolingual and multilingual intent classification.
We also create new benchmark datasets for speech summarization and low-resource/zero-shot transfer from English to French or Spanish.
arXiv Detail & Related papers (2023-05-16T17:53:03Z) - ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for
Scene Text Spotting [121.11880210592497]
We argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input.
We propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting.
arXiv Detail & Related papers (2022-11-19T03:50:33Z) - BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and
Semantic Parsing [55.058258437125524]
We introduce BenchCLAMP, a Benchmark to evaluate Constrained LAnguage Model Parsing.
We benchmark eight language models, including two GPT-3 variants available only through an API.
Our experiments show that encoder-decoder pretrained language models can achieve similar performance or surpass state-of-the-art methods for syntactic and semantic parsing when the model output is constrained to be valid.
arXiv Detail & Related papers (2022-06-21T18:34:11Z) - TunBERT: Pretrained Contextualized Text Representation for Tunisian
Dialect [0.0]
We investigate the feasibility of training monolingual Transformer-based language models for under represented languages.
We show that the use of noisy web crawled data instead of structured data is more convenient for such non-standardized language.
Our best performing TunBERT model reaches or improves the state-of-the-art in all three downstream tasks.
arXiv Detail & Related papers (2021-11-25T15:49:50Z) - Read Like Humans: Autonomous, Bidirectional and Iterative Language
Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition.
How to effectively model linguistic rules in end-to-end deep networks remains a research challenge.
We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z) - Neuro-Symbolic Representations for Video Captioning: A Case for
Leveraging Inductive Biases for Vision and Language [148.0843278195794]
We propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning.
Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions.
arXiv Detail & Related papers (2020-11-18T20:21:19Z) - Learning Spoken Language Representations with Neural Lattice Language
Modeling [39.50831917042577]
We propose a framework that trains neural lattice language models to provide contextualized representations for spoken language understanding tasks.
The proposed two-stage pre-training approach reduces the demands of speech data and has better efficiency.
arXiv Detail & Related papers (2020-07-06T10:38:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.