Related papers: DIET: Lightweight Language Understanding for Dialogue Systems

DIET: Lightweight Language Understanding for Dialogue Systems

URL: http://arxiv.org/abs/2004.09936v3
Date: Mon, 11 May 2020 06:13:56 GMT
Title: DIET: Lightweight Language Understanding for Dialogue Systems
Authors: Tanja Bunk, Daksh Varshneya, Vladimir Vlasov, Alan Nichol
Abstract summary: Large-scale pre-trained language models have shown impressive results on language understanding benchmarks like GLUE and SuperGLUE. We introduce the Dual Intent and Entity Transformer (DIET) architecture, and study the effectiveness of different pre-trained representations on intent and entity prediction.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale pre-trained language models have shown impressive results on language understanding benchmarks like GLUE and SuperGLUE, improving considerably over other pre-training methods like distributed representations (GloVe) and purely supervised approaches. We introduce the Dual Intent and Entity Transformer (DIET) architecture, and study the effectiveness of different pre-trained representations on intent and entity prediction, two common dialogue language understanding tasks. DIET advances the state of the art on a complex multi-domain NLU dataset and achieves similarly high performance on other simpler datasets. Surprisingly, we show that there is no clear benefit to using large pre-trained models for this task, and in fact DIET improves upon the current state of the art even in a purely supervised setup without any pre-trained embeddings. Our best performing model outperforms fine-tuning BERT and is about six times faster to train.

Related papers

Platonic Grounding for Efficient Multimodal Language Models [22.715168904364756]
We motivate and propose a simple modification to existing multimodal frameworks that rely on aligning pretrained models. Our work also has implications for combining pretrained models into larger systems efficiently.
arXiv Detail & Related papers (2025-04-27T18:56:26Z)
MLAN: Language-Based Instruction Tuning Improves Zero-Shot Generalization of Multimodal Large Language Models [79.0546136194314]
We present a novel instruction tuning recipe to improve the zero-shot task generalization of multimodal large language models. We evaluate the performance of the proposed approach on 9 unseen datasets across both language and vision modalities.
arXiv Detail & Related papers (2024-11-15T20:09:59Z)
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models. We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z)
eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z)
Efficient Speech Translation with Pre-trained Models [13.107314023500349]
We investigate efficient strategies to build cascaded and end-to-end speech translation systems based on pre-trained models. While the end-to-end models show superior translation performance to cascaded ones, the application of this technology has a limitation on the need for additional end-to-end training data.
arXiv Detail & Related papers (2022-11-09T15:07:06Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed. We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z)
XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders. Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z)
LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization [19.89228774074371]
We propose a simple yet effective pre-training method named LICHEE to efficiently incorporate multi-grained information of input text. Our method can be applied to various pre-trained language models and improve their representation capability.
arXiv Detail & Related papers (2021-08-02T12:08:19Z)
A Comprehensive Comparison of Pre-training Language Models [0.5139874302398955]
We pre-train a list of transformer-based models with the same amount of text and the same training steps. The experimental results show that the most improvement upon the origin BERT is adding the RNN-layer to capture more contextual information for short text understanding.
arXiv Detail & Related papers (2021-06-22T02:12:29Z)
InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training. We propose a new pre-training task based on contrastive learning. By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.