SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks
- URL: http://arxiv.org/abs/2303.00733v1
- Date: Wed, 1 Mar 2023 18:47:41 GMT
- Title: SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks
- Authors: Kai-Wei Chang, Yu-Kai Wang, Hua Shen, Iu-thing Kang, Wei-Cheng Tseng,
Shang-Wen Li, Hung-yi Lee
- Abstract summary: We propose SpeechPrompt v2, a prompt tuning framework capable of performing a wide variety of speech classification tasks.
Experiment result shows that SpeechPrompt v2 achieves performance on par with prior works with less than 0.15M trainable parameters.
- Score: 94.30385972442387
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prompt tuning is a technology that tunes a small set of parameters to steer a
pre-trained language model (LM) to directly generate the output for downstream
tasks. Recently, prompt tuning has demonstrated its storage and computation
efficiency in both natural language processing (NLP) and speech processing
fields. These advantages have also revealed prompt tuning as a candidate
approach to serving pre-trained LM for multiple tasks in a unified manner. For
speech processing, SpeechPrompt shows its high parameter efficiency and
competitive performance on a few speech classification tasks. However, whether
SpeechPrompt is capable of serving a large number of tasks is unanswered. In
this work, we propose SpeechPrompt v2, a prompt tuning framework capable of
performing a wide variety of speech classification tasks, covering multiple
languages and prosody-related tasks. The experiment result shows that
SpeechPrompt v2 achieves performance on par with prior works with less than
0.15M trainable parameters in a unified framework.
Related papers
- SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks [94.10497337235083]
We are first to explore the potential of prompting speech LMs in the domain of speech processing.
We reformulate speech processing tasks into speech-to-unit generation tasks.
We show that the prompting method can achieve competitive performance compared to the strong fine-tuning method.
arXiv Detail & Related papers (2024-08-23T13:00:10Z) - PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models [19.719401865551745]
We present a multitask speech model -- PolySpeech, which supports speech recognition, speech synthesis, and two speech classification tasks.
PolySpeech shows competitiveness across various tasks compared to single-task models.
arXiv Detail & Related papers (2024-06-12T01:35:46Z) - SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition [67.08798754009153]
Speech language models typically utilize task-dependent prompt tokens to unify various speech tasks in a single model.
We propose a novel decoder-only speech language model, SpeechComposer, that can unify common speech tasks by composing a fixed set of prompt tokens.
arXiv Detail & Related papers (2024-01-31T18:06:29Z) - SpeechGen: Unlocking the Generative Power of Speech Language Models with
Prompts [108.04306136086807]
We present research that explores the application of prompt tuning to stimulate speech LMs for various generation tasks, within a unified framework called SpeechGen.
The proposed unified framework holds great promise for efficiency and effectiveness, particularly with the imminent arrival of advanced speech LMs.
arXiv Detail & Related papers (2023-06-03T22:35:27Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - An Exploration of Prompt Tuning on Generative Spoken Language Model for
Speech Processing Tasks [112.1942546460814]
We report the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM)
Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models.
arXiv Detail & Related papers (2022-03-31T03:26:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.