Intent Recognition and Unsupervised Slot Identification for Low
Resourced Spoken Dialog Systems
- URL: http://arxiv.org/abs/2104.01287v1
- Date: Sat, 3 Apr 2021 01:58:27 GMT
- Title: Intent Recognition and Unsupervised Slot Identification for Low
Resourced Spoken Dialog Systems
- Authors: Akshat Gupta, Sai Krishna Rallabandi, Alan W Black
- Abstract summary: We present an acoustic based SLU system that converts speech to its phonetic transcription using a universal phone recognition system.
We build a word-free natural language understanding module that does intent recognition and slot identification from these phonetic transcription.
We observe more than 10% improvement for intent classification in Tamil and more than 5% improvement for intent classification in Sinhala.
- Score: 46.705058576039065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Intent Recognition and Slot Identification are crucial components in spoken
language understanding (SLU) systems. In this paper, we present a novel
approach towards both these tasks in the context of low resourced and unwritten
languages. We present an acoustic based SLU system that converts speech to its
phonetic transcription using a universal phone recognition system. We build a
word-free natural language understanding module that does intent recognition
and slot identification from these phonetic transcription. Our proposed SLU
system performs competitively for resource rich scenarios and significantly
outperforms existing approaches as the amount of available data reduces. We
observe more than 10% improvement for intent classification in Tamil and more
than 5% improvement for intent classification in Sinhala. We also present a
novel approach towards unsupervised slot identification using normalized
attention scores. This approach can be used for unsupervised slot labelling,
data augmentation and to generate data for a new slot in a one-shot way with
only one speech recording
Related papers
- Generalized zero-shot audio-to-intent classification [7.76114116227644]
We propose a generalized zero-shot audio-to-intent classification framework with only a few sample text sentences per intent.
We leverage a neural audio synthesizer to create audio embeddings for sample text utterances.
Our multimodal training approach improves the accuracy of zero-shot intent classification on unseen intents of SLURP by 2.75% and 18.2%.
arXiv Detail & Related papers (2023-11-04T18:55:08Z) - Improving Textless Spoken Language Understanding with Discrete Units as
Intermediate Target [58.59044226658916]
Spoken Language Understanding (SLU) is a task that aims to extract semantic information from spoken utterances.
We propose to use discrete units as intermediate guidance to improve textless SLU performance.
arXiv Detail & Related papers (2023-05-29T14:00:24Z) - Introducing Semantics into Speech Encoders [91.37001512418111]
We propose an unsupervised way of incorporating semantic information from large language models into self-supervised speech encoders without labeled audio transcriptions.
Our approach achieves similar performance as supervised methods trained on over 100 hours of labeled audio transcripts.
arXiv Detail & Related papers (2022-11-15T18:44:28Z) - On Building Spoken Language Understanding Systems for Low Resourced
Languages [1.2183405753834562]
We present a series of experiments to explore extremely low-resourced settings.
We perform intent classification with systems trained on as low as one data-point per intent and with only one speaker in the dataset.
We find that using phonetic transcriptions to make intent classification systems in such low-resourced setting performs significantly better than using speech features.
arXiv Detail & Related papers (2022-05-25T14:44:51Z) - Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer
in ASR [13.726142328715897]
We present a method for cross-lingual training an ASR system using absolutely no transcribed training data from the target language.
Our approach uses a novel application of a decipherment algorithm, which operates given only unpaired speech and text data from the target language.
arXiv Detail & Related papers (2021-11-12T16:16:46Z) - Intent Classification Using Pre-Trained Embeddings For Low Resource
Languages [67.40810139354028]
Building Spoken Language Understanding systems that do not rely on language specific Automatic Speech Recognition is an important yet less explored problem in language processing.
We present a comparative study aimed at employing a pre-trained acoustic model to perform Spoken Language Understanding in low resource scenarios.
We perform experiments across three different languages: English, Sinhala, and Tamil each with different data sizes to simulate high, medium, and low resource scenarios.
arXiv Detail & Related papers (2021-10-18T13:06:59Z) - Unsupervised Acoustic Unit Discovery by Leveraging a
Language-Independent Subword Discriminative Feature Representation [31.87235700253597]
This paper tackles automatically discovering phone-like acoustic units (AUD) from unlabeled speech data.
We propose a two-stage approach: the first stage learns a subword-discriminative feature representation and the second stage applies clustering to the learned representation and obtains phone-like clusters as the discovered acoustic units.
arXiv Detail & Related papers (2021-04-02T11:43:07Z) - Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and
language Models for Intent Classification [81.80311855996584]
We propose a novel intent classification framework that employs acoustic features extracted from a pretrained speech recognition system and linguistic features learned from a pretrained language model.
We achieve 90.86% and 99.07% accuracy on ATIS and Fluent speech corpus, respectively.
arXiv Detail & Related papers (2021-02-15T07:20:06Z) - Acoustics Based Intent Recognition Using Discovered Phonetic Units for
Low Resource Languages [51.0542215642794]
We propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification.
We present results for two languages families - Indic languages and Romance languages, for two different intent recognition tasks.
arXiv Detail & Related papers (2020-11-07T00:35:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.