An Effective Non-Autoregressive Model for Spoken Language Understanding
- URL: http://arxiv.org/abs/2108.07005v1
- Date: Mon, 16 Aug 2021 10:26:57 GMT
- Title: An Effective Non-Autoregressive Model for Spoken Language Understanding
- Authors: Lizhi Cheng, Weijia Jia, Wenmian Yang
- Abstract summary: We propose a novel non-autoregressive Spoken Language Understanding model named Layered-Refine Transformer.
With SLG, the non-autoregressive model can efficiently obtain dependency information during training and spend no extra time in inference.
Experiments on two public datasets indicate that our model significantly improves SLU performance (1.5% on Overall accuracy) while substantially speed up (more than 10 times) the inference process.
- Score: 15.99246711701726
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Spoken Language Understanding (SLU), a core component of the task-oriented
dialogue system, expects a shorter inference latency due to the impatience of
humans. Non-autoregressive SLU models clearly increase the inference speed but
suffer uncoordinated-slot problems caused by the lack of sequential dependency
information among each slot chunk. To gap this shortcoming, in this paper, we
propose a novel non-autoregressive SLU model named Layered-Refine Transformer,
which contains a Slot Label Generation (SLG) task and a Layered Refine
Mechanism (LRM). SLG is defined as generating the next slot label with the
token sequence and generated slot labels. With SLG, the non-autoregressive
model can efficiently obtain dependency information during training and spend
no extra time in inference. LRM predicts the preliminary SLU results from
Transformer's middle states and utilizes them to guide the final prediction.
Experiments on two public datasets indicate that our model significantly
improves SLU performance (1.5\% on Overall accuracy) while substantially speed
up (more than 10 times) the inference process over the state-of-the-art
baseline.
Related papers
- Mixture of Attentions For Speculative Decoding [17.344416130742232]
Speculative decoding (SD) leverages smaller models to efficiently propose future tokens, which are then verified by the Large Language Models in parallel.
We identify several limitations of SD models including the lack of on-policyness during training and partial observability.
We propose a more grounded architecture for small models by introducing a Mixture of Attentions for SD.
arXiv Detail & Related papers (2024-10-04T10:25:52Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding [44.77985942208969]
PRoDeliberation is a novel method leveraging a Connectionist Temporal Classification-based decoding strategy as well as a denoising objective to train robust non-autoregressive deliberation models.
We show that PRoDeliberation achieves the latency reduction of parallel decoding (2-10x improvement over autoregressive models) while retaining the ability to correct Automatic Speech Recognition (ASR) mistranscriptions.
arXiv Detail & Related papers (2024-06-12T02:46:17Z) - LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence [68.27280750612204]
We introduce the large auto-regressive model (LARM) for embodied agents.
LARM uses both text and multi-view images as input and predicts subsequent actions in an auto-regressive manner.
Adopting a two-phase training regimen, LARM successfully harvests enchanted equipment in Minecraft.
arXiv Detail & Related papers (2024-05-27T17:59:32Z) - Token-level Sequence Labeling for Spoken Language Understanding using
Compositional End-to-End Models [94.30953696090758]
We build compositional end-to-end spoken language understanding systems.
By relying on intermediate decoders trained for ASR, our end-to-end systems transform the input modality from speech to token-level representations.
Our models outperform both cascaded and direct end-to-end models on a labeling task of named entity recognition.
arXiv Detail & Related papers (2022-10-27T19:33:18Z) - STOP: A dataset for Spoken Task Oriented Semantic Parsing [66.14615249745448]
End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model.
We release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available.
In addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems.
arXiv Detail & Related papers (2022-06-29T00:36:34Z) - Capture Salient Historical Information: A Fast and Accurate
Non-Autoregressive Model for Multi-turn Spoken Language Understanding [18.988599232838766]
Existing work increases inference speed by designing non-autoregressive models for single-turn Spoken Language Understanding tasks.
We propose a novel model for multi-turn SLU named Salient History Attention with Layer-Refined Transformer (SHA-LRT)
SHA captures historical information for the current dialogue from both historical utterances and results via a well-designed history-attention mechanism.
arXiv Detail & Related papers (2022-06-24T10:45:32Z) - Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language
Understanding [16.381644007368763]
End-to-end Spoken Language Understanding (E2E SLU) has attracted increasing interest due to its advantages of joint optimization and low latency.
We propose a streamable multi-task semantic transducer model to address these considerations.
Our proposed architecture predicts ASR and NLU labels auto-regressively and uses a semantic decoder to ingest both previously predicted word-pieces and slot tags.
arXiv Detail & Related papers (2022-04-01T16:38:56Z) - Modeling Token-level Uncertainty to Learn Unknown Concepts in SLU via
Calibrated Dirichlet Prior RNN [98.4713940310056]
One major task of spoken language understanding (SLU) in modern personal assistants is to extract semantic concepts from an utterance.
Recent research collected question and answer annotated data to learn what is unknown and should be asked.
We incorporate softmax-based slot filling neural architectures to model the sequence uncertainty without question supervision.
arXiv Detail & Related papers (2020-10-16T02:12:30Z) - Depth-Adaptive Graph Recurrent Network for Text Classification [71.20237659479703]
Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network.
We propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required.
arXiv Detail & Related papers (2020-02-29T03:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.