Speech-Copilot: Leveraging Large Language Models for Speech Processing   via Task Decomposition, Modularization, and Program Generation
        - URL: http://arxiv.org/abs/2407.09886v2
 - Date: Mon, 23 Sep 2024 16:45:04 GMT
 - Title: Speech-Copilot: Leveraging Large Language Models for Speech Processing   via Task Decomposition, Modularization, and Program Generation
 - Authors: Chun-Yi Kuan, Chih-Kai Yang, Wei-Ping Huang, Ke-Han Lu, Hung-yi Lee, 
 - Abstract summary: Speech-Copilot is a modular framework for instruction-oriented speech-processing tasks.
It builds speech processing-specific toolsets by analyzing pre-collected task instructions.
It features a flexible agent based on large language models that performs tasks through program generation.
 - Score: 42.55462692822432
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   In this work, we introduce Speech-Copilot, a modular framework for instruction-oriented speech-processing tasks that minimizes human effort in toolset construction. Unlike end-to-end methods using large audio-language models, Speech-Copilot builds speech processing-specific toolsets by analyzing pre-collected task instructions and breaking tasks into manageable sub-tasks. It features a flexible agent based on large language models that performs tasks through program generation. Our approach achieves state-of-the-art performance on the Dynamic-SUPERB benchmark, demonstrating its effectiveness across diverse speech-processing tasks. Key contributions include: 1) developing an innovative framework for speech processing-specific toolset construction, 2) establishing a high-performing agent based on large language models, and 3) offering a new perspective on addressing challenging instruction-oriented speech-processing tasks. Without additional training processes required by end-to-end approaches, our method provides a flexible and extendable solution for a wide range of speech-processing applications. 
 
       
      
        Related papers
        - ESPnet-SpeechLM: An Open Speech Language Model Toolkit [98.4525334631522]
We present ESPnet-SpeechLM, an open toolkit designed to democratize the development of speech language models (SpeechLMs)
The toolkit standardizes speech processing tasks by framing them as universal sequential modeling problems.
With ESPnet-SpeechLM, users can easily define task templates and configure key settings, enabling seamless and streamlined SpeechLM development.
arXiv  Detail & Related papers  (2025-02-21T05:21:58Z) - SpeechPrompt: Prompting Speech Language Models for Speech Processing   Tasks [94.10497337235083]
We are first to explore the potential of prompting speech LMs in the domain of speech processing.
We reformulate speech processing tasks into speech-to-unit generation tasks.
We show that the prompting method can achieve competitive performance compared to the strong fine-tuning method.
arXiv  Detail & Related papers  (2024-08-23T13:00:10Z) - An Adapter-Based Unified Model for Multiple Spoken Language Processing   Tasks [3.015760169663536]
We investigate the potential of adapter-based fine-tuning in developing a unified model capable of handling multiple spoken language processing tasks.
We show that adapter-based fine-tuning enables a single encoder-decoder model to perform multiple speech processing tasks with an average improvement of 18.4%.
arXiv  Detail & Related papers  (2024-06-20T21:39:04Z) - SpeechVerse: A Large-scale Generalizable Audio Language Model [38.67969337605572]
SpeechVerse is a robust multi-task training and curriculum learning framework.
It combines pre-trained speech and text foundation models via a small set of learnable parameters.
Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks.
arXiv  Detail & Related papers  (2024-05-14T03:33:31Z) - WavLLM: Towards Robust and Adaptive Speech Large Language Model [93.0773293897888]
We introduce WavLLM, a robust and adaptive speech large language model with dual encoders, and a prompt-aware LoRA weight adapter.
We validate the proposed model on universal speech benchmarks including tasks such as ASR, ST, SV, ER, and also apply it to specialized datasets like Gaokao English listening comprehension set for SQA, and speech Chain-of-Thought (CoT) evaluation set.
arXiv  Detail & Related papers  (2024-03-31T12:01:32Z) - SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition [67.08798754009153]
Speech language models typically utilize task-dependent prompt tokens to unify various speech tasks in a single model.
We propose a novel decoder-only speech language model, SpeechComposer, that can unify common speech tasks by composing a fixed set of prompt tokens.
arXiv  Detail & Related papers  (2024-01-31T18:06:29Z) - Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive   Instruction-Tuning Benchmark for Speech [107.81472531864195]
Text language models have shown remarkable zero-shot capability in generalizing to unseen tasks when provided with well-formulated instructions.
We present Dynamic-SUPERB, a benchmark for building universal speech models capable of leveraging instruction tuning to perform multiple tasks in a zero-shot fashion.
arXiv  Detail & Related papers  (2023-09-18T06:43:30Z) - Context-Aware Language Modeling for Goal-Oriented Dialogue Systems [84.65707332816353]
We formulate goal-oriented dialogue as a partially observed Markov decision process.
We derive a simple and effective method to finetune language models in a goal-aware way.
We evaluate our method on a practical flight-booking task using AirDialogue.
arXiv  Detail & Related papers  (2022-04-18T17:23:11Z) - Re-framing Incremental Deep Language Models for Dialogue Processing with
  Multi-task Learning [14.239355474794142]
We present a multi-task learning framework to enable the training of one universal incremental dialogue processing model.
We show that these tasks provide positive inductive biases to each other with the optimal contribution of each one relying on the severity of the noise from the task.
arXiv  Detail & Related papers  (2020-11-13T04:31:51Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.