End-to-end spoken language understanding using transformer networks and
self-supervised pre-trained features
- URL: http://arxiv.org/abs/2011.08238v1
- Date: Mon, 16 Nov 2020 19:30:52 GMT
- Title: End-to-end spoken language understanding using transformer networks and
self-supervised pre-trained features
- Authors: Edmilson Morais, Hong-Kwang J. Kuo, Samuel Thomas, Zoltan Tuske and
Brian Kingsbury
- Abstract summary: Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP)
We introduce a modular End-to-End (E2E) SLU transformer network based architecture which allows the use of self-supervised pre-trained acoustic features.
- Score: 17.407912171579852
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Transformer networks and self-supervised pre-training have consistently
delivered state-of-art results in the field of natural language processing
(NLP); however, their merits in the field of spoken language understanding
(SLU) still need further investigation. In this paper we introduce a modular
End-to-End (E2E) SLU transformer network based architecture which allows the
use of self-supervised pre-trained acoustic features, pre-trained model
initialization and multi-task training. Several SLU experiments for predicting
intent and entity labels/values using the ATIS dataset are performed. These
experiments investigate the interaction of pre-trained model initialization and
multi-task training with either traditional filterbank or self-supervised
pre-trained acoustic features. Results show not only that self-supervised
pre-trained acoustic features outperform filterbank features in almost all the
experiments, but also that when these features are used in combination with
multi-task training, they almost eliminate the necessity of pre-trained model
initialization.
Related papers
- Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in
Transformer Models [9.340409961107955]
Transformer models have the remarkable ability to perform in-context learning (ICL)
We study how effectively transformers can bridge between their pretraining data mixture.
Our results highlight that the impressive ICL abilities of high-capacity sequence models may be more closely tied to the coverage of their pretraining data mixtures than inductive biases.
arXiv Detail & Related papers (2023-11-01T21:41:08Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - Statistical Foundations of Prior-Data Fitted Networks [0.7614628596146599]
Prior-data fitted networks (PFNs) were recently proposed as a new paradigm for machine learning.
This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior.
arXiv Detail & Related papers (2023-05-18T16:34:21Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z) - Pretrained Transformers as Universal Computation Engines [105.00539596788127]
We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning.
We study finetuning it on a variety of sequence classification tasks spanning numerical computation, vision, and protein fold prediction.
We find that such pretraining enables FPT to generalize in zero-shot to these modalities, matching the performance of a transformer fully trained on these tasks.
arXiv Detail & Related papers (2021-03-09T06:39:56Z) - The Lottery Tickets Hypothesis for Supervised and Self-supervised
Pre-training in Computer Vision Models [115.49214555402567]
Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation.
Recent studies suggest that pre-training benefits from gigantic model capacity.
In this paper, we examine supervised and self-supervised pre-trained models through the lens of the lottery ticket hypothesis (LTH)
arXiv Detail & Related papers (2020-12-12T21:53:55Z) - Training Production Language Models without Memorizing User Data [7.004279935788177]
This paper presents the first consumer-scale next-word prediction (NWP) model trained with Federated Learning (FL)
We demonstrate the deployment of a differentially private mechanism for the training of a production neural network in FL.
arXiv Detail & Related papers (2020-09-21T17:12:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.