Style Attuned Pre-training and Parameter Efficient Fine-tuning for
Spoken Language Understanding
- URL: http://arxiv.org/abs/2010.04355v1
- Date: Fri, 9 Oct 2020 03:53:37 GMT
- Title: Style Attuned Pre-training and Parameter Efficient Fine-tuning for
Spoken Language Understanding
- Authors: Jin Cao, Jun Wang, Wael Hamza, Kelly Vanee, Shang-Wen Li
- Abstract summary: We introduce a novel framework for learning spoken language understanding.
The framework consists of a conversational language modeling (CLM) pre-training task and a light encoder architecture.
With the framework, we match the performance of state-of-the-art SLU results on Alexa internal datasets and on two public ones, adding only 4.4% parameters per task.
- Score: 19.105304214638075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural models have yielded state-of-the-art results in deciphering spoken
language understanding (SLU) problems; however, these models require a
significant amount of domain-specific labeled examples for training, which is
prohibitively expensive. While pre-trained language models like BERT have been
shown to capture a massive amount of knowledge by learning from unlabeled
corpora and solve SLU using fewer labeled examples for adaption, the encoding
of knowledge is implicit and agnostic to downstream tasks. Such encoding
results in model inefficiencies in parameter usage: an entirely new model is
required for every domain. To address these challenges, we introduce a novel
SLU framework, comprising a conversational language modeling (CLM) pre-training
task and a light encoder architecture. The CLM pre-training enables networks to
capture the representation of the language in conversation style with the
presence of ASR errors. The light encoder architecture separates the shared
pre-trained networks from the mappings of generally encoded knowledge to
specific domains of SLU, allowing for the domain adaptation to be performed
solely at the light encoder and thus increasing efficiency. With the framework,
we match the performance of state-of-the-art SLU results on Alexa internal
datasets and on two public ones (ATIS, SNIPS), adding only 4.4% parameters per
task.
Related papers
- DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning [13.601154787754046]
DRCap is a data-efficient and flexible zero-shot audio captioning system.
It requires text-only data for training and can quickly adapt to new domains without additional fine-tuning.
arXiv Detail & Related papers (2024-10-12T10:21:00Z) - A Single Transformer for Scalable Vision-Language Modeling [74.05173379908703]
We present SOLO, a single transformer for visiOn-Language mOdeling.
A unified single Transformer architecture, like SOLO, effectively addresses these scalability concerns in LVLMs.
In this paper, we introduce the first open-source training recipe for developing SOLO, an open-source 7B LVLM.
arXiv Detail & Related papers (2024-07-08T22:40:15Z) - Investigating Decoder-only Large Language Models for Speech-to-text Translation [39.17113782374464]
Large language models (LLMs) are known for their exceptional reasoning capabilities, generalizability, and fluency across diverse domains.
We propose a decoder-only architecture that enables the LLM to directly consume the encoded speech representation and generate the text translation.
Our model achieves state-of-the-art performance on CoVoST 2 and FLEURS among models trained without proprietary data.
arXiv Detail & Related papers (2024-07-03T14:42:49Z) - Large Language Models are Interpretable Learners [53.56735770834617]
In this paper, we show a combination of Large Language Models (LLMs) and symbolic programs can bridge the gap between expressiveness and interpretability.
The pretrained LLM with natural language prompts provides a massive set of interpretable modules that can transform raw input into natural language concepts.
As the knowledge learned by LSP is a combination of natural language descriptions and symbolic rules, it is easily transferable to humans (interpretable) and other LLMs.
arXiv Detail & Related papers (2024-06-25T02:18:15Z) - Graph Neural Prompting with Large Language Models [32.97391910476073]
Graph Neural Prompting (GNP) is a novel plug-and-play method to assist pre-trained language models in learning beneficial knowledge from knowledge graphs.
Extensive experiments on multiple datasets demonstrate the superiority of GNP on both commonsense and biomedical reasoning tasks.
arXiv Detail & Related papers (2023-09-27T06:33:29Z) - In-context Autoencoder for Context Compression in a Large Language Model [70.7621953091318]
We propose the In-context Autoencoder (ICAE) to compress a long context into short compact memory slots.
ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data.
arXiv Detail & Related papers (2023-07-13T17:59:21Z) - On decoder-only architecture for speech-to-text and large language model
integration [59.49886892602309]
Speech-LLaMA is a novel approach that effectively incorporates acoustic information into text-based large language models.
We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines.
arXiv Detail & Related papers (2023-07-08T06:47:58Z) - CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM.
For learning methods, we explore the claim of a "free lunch" hypothesis.
For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z) - KALA: Knowledge-Augmented Language Model Adaptation [65.92457495576141]
We propose a novel domain adaption framework for pre-trained language models (PLMs)
Knowledge-Augmented Language model Adaptation (KALA) modulates the intermediate hidden representations of PLMs with domain knowledge.
Results show that, despite being computationally efficient, our KALA largely outperforms adaptive pre-training.
arXiv Detail & Related papers (2022-04-22T08:11:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.