Adapting Pretrained Transformer to Lattices for Spoken Language
Understanding
- URL: http://arxiv.org/abs/2011.00780v1
- Date: Mon, 2 Nov 2020 07:14:34 GMT
- Title: Adapting Pretrained Transformer to Lattices for Spoken Language
Understanding
- Authors: Chao-Wei Huang and Yun-Nung Chen
- Abstract summary: It is shown that encoding lattices as opposed to 1-best results generated by automatic speech recognizer (ASR) boosts the performance of spoken language understanding (SLU)
This paper aims at adapting pretrained transformers to lattice inputs in order to perform understanding tasks specifically for spoken language.
- Score: 39.50831917042577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lattices are compact representations that encode multiple hypotheses, such as
speech recognition results or different word segmentations. It is shown that
encoding lattices as opposed to 1-best results generated by automatic speech
recognizer (ASR) boosts the performance of spoken language understanding (SLU).
Recently, pretrained language models with the transformer architecture have
achieved the state-of-the-art results on natural language understanding, but
their ability of encoding lattices has not been explored. Therefore, this paper
aims at adapting pretrained transformers to lattice inputs in order to perform
understanding tasks specifically for spoken language. Our experiments on the
benchmark ATIS dataset show that fine-tuning pretrained transformers with
lattice inputs yields clear improvement over fine-tuning with 1-best results.
Further evaluation demonstrates the effectiveness of our methods under
different acoustic conditions. Our code is available at
https://github.com/MiuLab/Lattice-SLU
Related papers
- GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator.
GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising.
Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z) - Syntax-guided Localized Self-attention by Constituency Syntactic
Distance [26.141356981833862]
We propose a syntax-guided localized self-attention for Transformer.
It allows incorporating directly grammar structures from an external constituency.
Experimental results show that our model could consistently improve translation performance.
arXiv Detail & Related papers (2022-10-21T06:37:25Z) - XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems
to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders.
Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z) - Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model.
We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder.
We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z) - Speech-language Pre-training for End-to-end Spoken Language
Understanding [18.548949994603213]
We propose to unify a well-optimized E2E ASR encoder (speech) and a pre-trained language model encoder (language) into a transformer decoder.
The experimental results on two public corpora show that our approach to E2E SLU is superior to the conventional cascaded method.
arXiv Detail & Related papers (2021-02-11T21:55:48Z) - Is Supervised Syntactic Parsing Beneficial for Language Understanding?
An Empirical Investigation [71.70562795158625]
Traditional NLP has long held (supervised) syntactic parsing necessary for successful higher-level semantic language understanding (LU)
Recent advent of end-to-end neural models, self-supervised via language modeling (LM), and their success on a wide range of LU tasks, questions this belief.
We empirically investigate the usefulness of supervised parsing for semantic LU in the context of LM-pretrained transformer networks.
arXiv Detail & Related papers (2020-08-15T21:03:36Z) - Relative Positional Encoding for Speech Recognition and Direct
Translation [72.64499573561922]
We adapt the relative position encoding scheme to the Speech Transformer.
As a result, the network can better adapt to the variable distributions present in speech data.
arXiv Detail & Related papers (2020-05-20T09:53:06Z) - Stacked DeBERT: All Attention in Incomplete Data for Text Classification [8.900866276512364]
We propose Stacked DeBERT, short for Stacked Denoising Bidirectional Representations from Transformers.
Our model shows improved F1-scores and better robustness in informal/incorrect texts present in tweets and in texts with Speech-to-Text error in sentiment and intent classification tasks.
arXiv Detail & Related papers (2020-01-01T04:49:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.