A Study of Different Ways to Use The Conformer Model For Spoken Language
Understanding
- URL: http://arxiv.org/abs/2204.03879v1
- Date: Fri, 8 Apr 2022 07:12:11 GMT
- Title: A Study of Different Ways to Use The Conformer Model For Spoken Language
Understanding
- Authors: Nick J.C. Wang, Shaojun Wang, Jing Xiao
- Abstract summary: We compare different ways to combine ASR and NLU, in particular using a single Conformer model.
We find that it is not necessarily a choice between two-stage decoding and end-to-end systems which determines the best system for research or application.
- Score: 25.41993752756759
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: SLU combines ASR and NLU capabilities to accomplish speech-to-intent
understanding. In this paper, we compare different ways to combine ASR and NLU,
in particular using a single Conformer model with different ways to use its
components, to better understand the strengths and weaknesses of each approach.
We find that it is not necessarily a choice between two-stage decoding and
end-to-end systems which determines the best system for research or
application. System optimization still entails carefully improving the
performance of each component. It is difficult to prove that one direction is
conclusively better than the other. In this paper, we also propose a novel
connectionist temporal summarization (CTS) method to reduce the length of
acoustic encoding sequences while improving the accuracy and processing speed
of end-to-end models. This method achieves the same intent accuracy as the best
two-stage SLU recognition with complicated and time-consuming decoding but does
so at lower computational cost. This stacked end-to-end SLU system yields an
intent accuracy of 93.97% for the SmartLights far-field set, 95.18% for the
close-field set, and 99.71% for FluentSpeech.
Related papers
- Decoding-Time Language Model Alignment with Multiple Objectives [88.64776769490732]
Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives.
Here, we propose $textbfmulti-objective decoding (MOD)$, a decoding-time algorithm that outputs the next token from a linear combination of predictions.
We show why existing approaches can be sub-optimal even in natural settings and obtain optimality guarantees for our method.
arXiv Detail & Related papers (2024-06-27T02:46:30Z) - Bridging the Gap Between End-to-End and Two-Step Text Spotting [88.14552991115207]
Bridging Text Spotting is a novel approach that resolves the error accumulation and suboptimal performance issues in two-step methods.
We demonstrate the effectiveness of the proposed method through extensive experiments.
arXiv Detail & Related papers (2024-04-06T13:14:04Z) - Modality Confidence Aware Training for Robust End-to-End Spoken Language
Understanding [18.616202196061966]
End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic parse from speech have become more promising recently.
This approach uses a single model that utilizes audio and text representations from pre-trained speech recognition models (ASR)
We propose a novel E2E SLU system that enhances robustness to ASR errors by fusing audio and text representations based on the estimated modality confidence of ASR hypotheses.
arXiv Detail & Related papers (2023-07-22T17:47:31Z) - A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at
Scale [64.10124092250126]
Unpaired text and audio injection have emerged as dominant methods for improving ASR performance in the absence of a large labeled corpus.
In this work, we compare three state-of-the-art semi-supervised methods encompassing both unpaired text and audio as well as several of their combinations in a controlled setting.
We find that in our setting these methods offer many improvements beyond raw WER, including substantial gains in tail-word WER, decoder computation during inference, and lattice density.
arXiv Detail & Related papers (2023-04-19T18:09:27Z) - Effectiveness of Text, Acoustic, and Lattice-based representations in
Spoken Language Understanding tasks [5.66060067322059]
We benchmark three types of systems to perform the intent detection task.
We evaluate the systems on the publicly available SLURP spoken language resource corpus.
arXiv Detail & Related papers (2022-12-16T14:01:42Z) - Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning.
Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system.
The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z) - Deliberation Model for On-Device Spoken Language Understanding [69.5587671262691]
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU)
We show that our approach can significantly reduce the degradation when moving from natural speech to synthetic speech training.
arXiv Detail & Related papers (2022-04-04T23:48:01Z) - Boosting Continuous Sign Language Recognition via Cross Modality
Augmentation [135.30357113518127]
Continuous sign language recognition deals with unaligned video-text pair.
We propose a novel architecture with cross modality augmentation.
The proposed framework can be easily extended to other existing CTC based continuous SLR architectures.
arXiv Detail & Related papers (2020-10-11T15:07:50Z) - Intelligent and Reconfigurable Architecture for KL Divergence Based
Online Machine Learning Algorithm [0.0]
Online machine learning (OML) algorithms do not need any training phase and can be deployed directly in an unknown environment.
Online machine learning (OML) algorithms do not need any training phase and can be deployed directly in an unknown environment.
arXiv Detail & Related papers (2020-02-18T16:39:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.