Robust Spoken Language Understanding with RL-based Value Error Recovery
- URL: http://arxiv.org/abs/2009.03095v1
- Date: Mon, 7 Sep 2020 13:32:07 GMT
- Title: Robust Spoken Language Understanding with RL-based Value Error Recovery
- Authors: Chen Liu, Su Zhu, Lu Chen and Kai Yu
- Abstract summary: Spoken Language Understanding (SLU) aims to extract structured semantic representations (e.g., slot-value pairs) from speech recognized texts.
We propose a new robust SLU framework to guide the SLU input adaptation with a rule-based value error recovery module.
Experiments on the public CATSLU dataset show the effectiveness of our proposed approach.
- Score: 35.82890898452309
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken Language Understanding (SLU) aims to extract structured semantic
representations (e.g., slot-value pairs) from speech recognized texts, which
suffers from errors of Automatic Speech Recognition (ASR). To alleviate the
problem caused by ASR-errors, previous works may apply input adaptations to the
speech recognized texts, or correct ASR errors in predicted values by searching
the most similar candidates in pronunciation. However, these two methods are
applied separately and independently. In this work, we propose a new robust SLU
framework to guide the SLU input adaptation with a rule-based value error
recovery module. The framework consists of a slot tagging model and a
rule-based value error recovery module. We pursue on an adapted slot tagging
model which can extract potential slot-value pairs mentioned in ASR hypotheses
and is suitable for the existing value error recovery module. After the value
error recovery, we can achieve a supervision signal (reward) by comparing
refined slot-value pairs with annotations. Since operations of the value error
recovery are non-differentiable, we exploit policy gradient based Reinforcement
Learning (RL) to optimize the SLU model. Extensive experiments on the public
CATSLU dataset show the effectiveness of our proposed approach, which can
improve the robustness of SLU and outperform the baselines by significant
margins.
Related papers
- Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition [52.624909026294105]
We propose a non-autoregressive speech error correction method.
A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses.
The proposed system reduces the error rate by 21% compared with the ASR model.
arXiv Detail & Related papers (2024-06-29T17:56:28Z) - Towards ASR Robust Spoken Language Understanding Through In-Context
Learning With Word Confusion Networks [68.79880423713597]
We introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis.
Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts.
arXiv Detail & Related papers (2024-01-05T17:58:10Z) - ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for
Improving ASR Robustness in Spoken Language Understanding [55.39105863825107]
We propose Mutual Learning and Large-Margin Contrastive Learning (ML-LMCL) to improve automatic speech recognition (ASR) robustness.
In fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively.
Experiments on three datasets show that ML-LMCL outperforms existing models and achieves new state-of-the-art performance.
arXiv Detail & Related papers (2023-11-19T16:53:35Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Error Correction in ASR using Sequence-to-Sequence Models [32.41875780785648]
Post-editing in Automatic Speech Recognition entails automatically correcting common and systematic errors produced by the ASR system.
We propose to use a powerful pre-trained sequence-to-sequence model, BART, to serve as a denoising model.
Experimental results on accented speech data demonstrate that our strategy effectively rectifies a significant number of ASR errors.
arXiv Detail & Related papers (2022-02-02T17:32:59Z) - Attention-based Multi-hypothesis Fusion for Speech Summarization [83.04957603852571]
Speech summarization can be achieved by combining automatic speech recognition (ASR) and text summarization (TS)
ASR errors directly affect the quality of the output summary in the cascade approach.
We propose a cascade speech summarization model that is robust to ASR errors and that exploits multiple hypotheses generated by ASR to attenuate the effect of ASR errors on the summary.
arXiv Detail & Related papers (2021-11-16T03:00:29Z) - N-Best ASR Transformer: Enhancing SLU Performance using Multiple ASR
Hypotheses [0.0]
Spoken Language Understanding (SLU) parses speech into semantic structures like dialog acts and slots.
We show that our approach significantly outperforms the prior state-of-the-art when subjected to the low data regime.
arXiv Detail & Related papers (2021-06-11T17:29:00Z) - Do as I mean, not as I say: Sequence Loss Training for Spoken Language
Understanding [22.652754839140744]
Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech.
We propose non-differentiable sequence losses based on SLU metrics as a proxy for semantic error and use the REINFORCE trick to train ASR and SLU models with this loss.
We show that custom sequence loss training is the state-of-the-art on open SLU datasets and leads to 6% relative improvement in both ASR and NLU performance metrics.
arXiv Detail & Related papers (2021-02-12T20:09:08Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.