Intent Classification Using Pre-Trained Embeddings For Low Resource
Languages
- URL: http://arxiv.org/abs/2110.09264v1
- Date: Mon, 18 Oct 2021 13:06:59 GMT
- Title: Intent Classification Using Pre-Trained Embeddings For Low Resource
Languages
- Authors: Hemant Yadav, Akshat Gupta, Sai Krishna Rallabandi, Alan W Black,
Rajiv Ratn Shah
- Abstract summary: Building Spoken Language Understanding systems that do not rely on language specific Automatic Speech Recognition is an important yet less explored problem in language processing.
We present a comparative study aimed at employing a pre-trained acoustic model to perform Spoken Language Understanding in low resource scenarios.
We perform experiments across three different languages: English, Sinhala, and Tamil each with different data sizes to simulate high, medium, and low resource scenarios.
- Score: 67.40810139354028
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Building Spoken Language Understanding (SLU) systems that do not rely on
language specific Automatic Speech Recognition (ASR) is an important yet less
explored problem in language processing. In this paper, we present a
comparative study aimed at employing a pre-trained acoustic model to perform
SLU in low resource scenarios. Specifically, we use three different embeddings
extracted using Allosaurus, a pre-trained universal phone decoder: (1) Phone
(2) Panphone, and (3) Allo embeddings. These embeddings are then used in
identifying the spoken intent. We perform experiments across three different
languages: English, Sinhala, and Tamil each with different data sizes to
simulate high, medium, and low resource scenarios. Our system improves on the
state-of-the-art (SOTA) intent classification accuracy by approximately 2.11%
for Sinhala and 7.00% for Tamil and achieves competitive results on English.
Furthermore, we present a quantitative analysis of how the performance scales
with the number of training examples used per intent.
Related papers
- The Interpreter Understands Your Meaning: End-to-end Spoken Language
Understanding Aided by Speech Translation [13.352795145385645]
Speech translation (ST) is a good means of pretraining speech models for end-to-end spoken language understanding.
We show that our models reach higher performance over baselines on monolingual and multilingual intent classification.
We also create new benchmark datasets for speech summarization and low-resource/zero-shot transfer from English to French or Spanish.
arXiv Detail & Related papers (2023-05-16T17:53:03Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Finstreder: Simple and fast Spoken Language Understanding with Finite
State Transducers using modern Speech-to-Text models [69.35569554213679]
In Spoken Language Understanding (SLU) the task is to extract important information from audio commands.
This paper presents a simple method for embedding intents and entities into Finite State Transducers.
arXiv Detail & Related papers (2022-06-29T12:49:53Z) - On Building Spoken Language Understanding Systems for Low Resourced
Languages [1.2183405753834562]
We present a series of experiments to explore extremely low-resourced settings.
We perform intent classification with systems trained on as low as one data-point per intent and with only one speaker in the dataset.
We find that using phonetic transcriptions to make intent classification systems in such low-resourced setting performs significantly better than using speech features.
arXiv Detail & Related papers (2022-05-25T14:44:51Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Revisiting Tri-training of Dependency Parsers [10.977756226111348]
We compare two semi-supervised learning techniques, namely tri-training and pretrained word embeddings, in the task of dependency parsing.
We explore language-specific FastText and ELMo embeddings and multilingual BERT embeddings.
We find that pretrained word embeddings make more effective use of unlabelled data than tri-training but that the two approaches can be successfully combined.
arXiv Detail & Related papers (2021-09-16T17:19:05Z) - From Masked Language Modeling to Translation: Non-English Auxiliary
Tasks Improve Zero-shot Spoken Language Understanding [24.149299722716155]
We introduce xSID, a new benchmark for cross-lingual Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect.
We propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer.
Our results show that jointly learning the main tasks with masked language modeling is effective for slots, while machine translation transfer works best for intent classification.
arXiv Detail & Related papers (2021-05-15T23:51:11Z) - Multilingual Code-Switching for Zero-Shot Cross-Lingual Intent
Prediction and Slot Filling [29.17194639368877]
We propose a novel method to augment the monolingual source data using multilingual code-switching via random translations.
Experiments on the benchmark dataset of MultiATIS++ yielded an average improvement of +4.2% in accuracy for intent task and +1.8% in F1 for slot task.
We present an application of our method for crisis informatics using a new human-annotated tweet dataset of slot filling in English and Haitian Creole.
arXiv Detail & Related papers (2021-03-13T21:05:09Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition [148.43282526983637]
We develop LRSpeech, a TTS and ASR system for languages with low data cost.
We conduct experiments on an experimental language (English) and a truly low-resource language (Lithuanian) to verify the effectiveness of LRSpeech.
We are currently deploying LRSpeech into a commercialized cloud speech service to support TTS on more rare languages.
arXiv Detail & Related papers (2020-08-09T08:16:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.