Related papers: Adapting Pretrained Transformer to Lattices for Spoken Language Understanding

Adapting Pretrained Transformer to Lattices for Spoken Language Understanding

URL: http://arxiv.org/abs/2011.00780v1
Date: Mon, 2 Nov 2020 07:14:34 GMT
Title: Adapting Pretrained Transformer to Lattices for Spoken Language Understanding
Authors: Chao-Wei Huang and Yun-Nung Chen
Abstract summary: It is shown that encoding lattices as opposed to 1-best results generated by automatic speech recognizer (ASR) boosts the performance of spoken language understanding (SLU) This paper aims at adapting pretrained transformers to lattice inputs in order to perform understanding tasks specifically for spoken language.
Score: 39.50831917042577
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Lattices are compact representations that encode multiple hypotheses, such as speech recognition results or different word segmentations. It is shown that encoding lattices as opposed to 1-best results generated by automatic speech recognizer (ASR) boosts the performance of spoken language understanding (SLU). Recently, pretrained language models with the transformer architecture have achieved the state-of-the-art results on natural language understanding, but their ability of encoding lattices has not been explored. Therefore, this paper aims at adapting pretrained transformers to lattice inputs in order to perform understanding tasks specifically for spoken language. Our experiments on the benchmark ATIS dataset show that fine-tuning pretrained transformers with lattice inputs yields clear improvement over fine-tuning with 1-best results. Further evaluation demonstrates the effectiveness of our methods under different acoustic conditions. Our code is available at https://github.com/MiuLab/Lattice-SLU

Related papers

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator. GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising. Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z)
Syntax-guided Localized Self-attention by Constituency Syntactic Distance [26.141356981833862]
We propose a syntax-guided localized self-attention for Transformer. It allows incorporating directly grammar structures from an external constituency. Experimental results show that our model could consistently improve translation performance.
arXiv Detail & Related papers (2022-10-21T06:37:25Z)
XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders. Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z)
Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model. We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder. We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z)
Speech-language Pre-training for End-to-end Spoken Language Understanding [18.548949994603213]
We propose to unify a well-optimized E2E ASR encoder (speech) and a pre-trained language model encoder (language) into a transformer decoder. The experimental results on two public corpora show that our approach to E2E SLU is superior to the conventional cascaded method.
arXiv Detail & Related papers (2021-02-11T21:55:48Z)
Is Supervised Syntactic Parsing Beneficial for Language Understanding? An Empirical Investigation [71.70562795158625]
Traditional NLP has long held (supervised) syntactic parsing necessary for successful higher-level semantic language understanding (LU) Recent advent of end-to-end neural models, self-supervised via language modeling (LM), and their success on a wide range of LU tasks, questions this belief. We empirically investigate the usefulness of supervised parsing for semantic LU in the context of LM-pretrained transformer networks.
arXiv Detail & Related papers (2020-08-15T21:03:36Z)
Relative Positional Encoding for Speech Recognition and Direct Translation [72.64499573561922]
We adapt the relative position encoding scheme to the Speech Transformer. As a result, the network can better adapt to the variable distributions present in speech data.
arXiv Detail & Related papers (2020-05-20T09:53:06Z)
Stacked DeBERT: All Attention in Incomplete Data for Text Classification [8.900866276512364]
We propose Stacked DeBERT, short for Stacked Denoising Bidirectional Representations from Transformers. Our model shows improved F1-scores and better robustness in informal/incorrect texts present in tweets and in texts with Speech-to-Text error in sentiment and intent classification tasks.
arXiv Detail & Related papers (2020-01-01T04:49:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.