An Exploration of Prompt Tuning on Generative Spoken Language Model for
Speech Processing Tasks
- URL: http://arxiv.org/abs/2203.16773v1
- Date: Thu, 31 Mar 2022 03:26:55 GMT
- Title: An Exploration of Prompt Tuning on Generative Spoken Language Model for
Speech Processing Tasks
- Authors: Kai-Wei Chang, Wei-Cheng Tseng, Shang-Wen Li, Hung-yi Lee
- Abstract summary: We report the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM)
Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models.
- Score: 112.1942546460814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech representations learned from Self-supervised learning (SSL) models
have been found beneficial for various speech processing tasks. However,
utilizing SSL representations usually requires fine-tuning the pre-trained
models or designing task-specific downstream models and loss functions, causing
much memory usage and human labor. On the other hand, prompting in Natural
Language Processing (NLP) is an efficient and widely used technique to leverage
pre-trained language models (LMs). Nevertheless, such a paradigm is little
studied in the speech community. We report in this paper the first exploration
of the prompt tuning paradigm for speech processing tasks based on Generative
Spoken Language Model (GSLM). Experiment results show that the prompt tuning
technique achieves competitive performance in speech classification tasks with
fewer trainable parameters than fine-tuning specialized downstream models. We
further study the technique in challenging sequence generation tasks. Prompt
tuning also demonstrates its potential, while the limitation and possible
research directions are discussed in this paper.
Related papers
- Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data [84.01401439030265]
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs)
We present a simple yet effective automatic process for creating speech-text pair data.
Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data.
arXiv Detail & Related papers (2024-09-30T07:01:21Z) - LAST: Language Model Aware Speech Tokenization [24.185165710384997]
We propose a novel approach to training a speech tokenizer by leveraging objectives from pre-trained textual LMs.
Our aim is to transform features from a pre-trained speech model into a new feature space that enables better clustering for speech LMs.
arXiv Detail & Related papers (2024-09-05T16:57:39Z) - SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks [94.10497337235083]
We are first to explore the potential of prompting speech LMs in the domain of speech processing.
We reformulate speech processing tasks into speech-to-unit generation tasks.
We show that the prompting method can achieve competitive performance compared to the strong fine-tuning method.
arXiv Detail & Related papers (2024-08-23T13:00:10Z) - SpeechVerse: A Large-scale Generalizable Audio Language Model [38.67969337605572]
SpeechVerse is a robust multi-task training and curriculum learning framework.
It combines pre-trained speech and text foundation models via a small set of learnable parameters.
Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks.
arXiv Detail & Related papers (2024-05-14T03:33:31Z) - On Conditional and Compositional Language Model Differentiable Prompting [75.76546041094436]
Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks.
We propose a new model, Prompt Production System (PRopS), which learns to transform task instructions or input metadata, into continuous prompts.
arXiv Detail & Related papers (2023-07-04T02:47:42Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Speech Representation Learning Through Self-supervised Pretraining And
Multi-task Finetuning [63.38155671200249]
We show that MTL finetuning can further improve SSL pretraining.
We analyze the generalizability of supervised MTL finetuning to examine if the speech representation learned by MTL finetuning can generalize to unseen new tasks.
arXiv Detail & Related papers (2021-10-18T07:16:04Z) - Differentiable Prompt Makes Pre-trained Language Models Better Few-shot
Learners [23.150999852147283]
This study proposes a novel pluggable, and efficient approach named DifferentiAble pRompT (DART)
It can convert small language models into better few-shot learners without any prompt engineering.
A comprehensive evaluation of standard NLP tasks demonstrates that the proposed approach achieves a better few-shot performance.
arXiv Detail & Related papers (2021-08-30T12:29:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.