An Adapter-Based Unified Model for Multiple Spoken Language Processing Tasks
- URL: http://arxiv.org/abs/2406.14747v1
- Date: Thu, 20 Jun 2024 21:39:04 GMT
- Title: An Adapter-Based Unified Model for Multiple Spoken Language Processing Tasks
- Authors: Varsha Suresh, Salah Aït-Mokhtar, Caroline Brun, Ioan Calapodescu,
- Abstract summary: We investigate the potential of adapter-based fine-tuning in developing a unified model capable of handling multiple spoken language processing tasks.
We show that adapter-based fine-tuning enables a single encoder-decoder model to perform multiple speech processing tasks with an average improvement of 18.4%.
- Score: 3.015760169663536
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Self-supervised learning models have revolutionized the field of speech processing. However, the process of fine-tuning these models on downstream tasks requires substantial computational resources, particularly when dealing with multiple speech-processing tasks. In this paper, we explore the potential of adapter-based fine-tuning in developing a unified model capable of effectively handling multiple spoken language processing tasks. The tasks we investigate are Automatic Speech Recognition, Phoneme Recognition, Intent Classification, Slot Filling, and Spoken Emotion Recognition. We validate our approach through a series of experiments on the SUPERB benchmark, and our results indicate that adapter-based fine-tuning enables a single encoder-decoder model to perform multiple speech processing tasks with an average improvement of 18.4% across the five target tasks while staying efficient in terms of parameter updates.
Related papers
- PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models [19.719401865551745]
We present a multitask speech model -- PolySpeech, which supports speech recognition, speech synthesis, and two speech classification tasks.
PolySpeech shows competitiveness across various tasks compared to single-task models.
arXiv Detail & Related papers (2024-06-12T01:35:46Z) - SpeechVerse: A Large-scale Generalizable Audio Language Model [38.67969337605572]
SpeechVerse is a robust multi-task training and curriculum learning framework.
It combines pre-trained speech and text foundation models via a small set of learnable parameters.
Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks.
arXiv Detail & Related papers (2024-05-14T03:33:31Z) - WavLLM: Towards Robust and Adaptive Speech Large Language Model [94.04010017961917]
We introduce WavLLM, a robust and adaptive speech large language model with dual encoders, and a prompt-aware LoRA weight adapter.
We validate the proposed model on universal speech benchmarks including tasks such as ASR, ST, SV, ER, and also apply it to specialized datasets like Gaokao English listening comprehension set for SQA, and speech Chain-of-Thought (CoT) evaluation set.
arXiv Detail & Related papers (2024-03-31T12:01:32Z) - SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition [67.08798754009153]
Speech language models typically utilize task-dependent prompt tokens to unify various speech tasks in a single model.
We propose a novel decoder-only speech language model, SpeechComposer, that can unify common speech tasks by composing a fixed set of prompt tokens.
arXiv Detail & Related papers (2024-01-31T18:06:29Z) - VioLA: Unified Codec Language Models for Speech Recognition, Synthesis,
and Translation [91.39949385661379]
VioLA is a single auto-regressive Transformer decoder-only network that unifies various cross-modal tasks involving speech and text.
We first convert all the speech utterances to discrete tokens using an offline neural encoder.
We further integrate task IDs (TID) and language IDs (LID) into the proposed model to enhance the modeling capability of handling different languages and tasks.
arXiv Detail & Related papers (2023-05-25T14:39:47Z) - Effective Cross-Task Transfer Learning for Explainable Natural Language
Inference with T5 [50.574918785575655]
We compare sequential fine-tuning with a model for multi-task learning in the context of boosting performance on two tasks.
Our results show that while sequential multi-task learning can be tuned to be good at the first of two target tasks, it performs less well on the second and additionally struggles with overfitting.
arXiv Detail & Related papers (2022-10-31T13:26:08Z) - An Exploration of Prompt Tuning on Generative Spoken Language Model for
Speech Processing Tasks [112.1942546460814]
We report the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM)
Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models.
arXiv Detail & Related papers (2022-03-31T03:26:55Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - Re-framing Incremental Deep Language Models for Dialogue Processing with
Multi-task Learning [14.239355474794142]
We present a multi-task learning framework to enable the training of one universal incremental dialogue processing model.
We show that these tasks provide positive inductive biases to each other with the optimal contribution of each one relying on the severity of the noise from the task.
arXiv Detail & Related papers (2020-11-13T04:31:51Z) - SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine
Teaching [81.45928589522032]
We parameterize modular task-oriented dialog systems using a Transformer-based auto-regressive language model.
We pre-train, on heterogeneous dialog corpora, a task-grounded response generation model.
Experiments show that SOLOIST creates new state-of-the-art on well-studied task-oriented dialog benchmarks.
arXiv Detail & Related papers (2020-05-11T17:58:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.