Generative Context-aware Fine-tuning of Self-supervised Speech Models
- URL: http://arxiv.org/abs/2312.09895v1
- Date: Fri, 15 Dec 2023 15:46:02 GMT
- Title: Generative Context-aware Fine-tuning of Self-supervised Speech Models
- Authors: Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji
Watanabe, Karen Livescu
- Abstract summary: We study the use of generative large language models (LLM) generated context information.
We propose an approach to distill the generated information during fine-tuning of self-supervised speech models.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis.
- Score: 54.389711404209415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When performing tasks like automatic speech recognition or spoken language
understanding for a given utterance, access to preceding text or audio provides
contextual information can improve performance. Considering the recent advances
in generative large language models (LLM), we hypothesize that an LLM could
generate useful context information using the preceding text. With appropriate
prompts, LLM could generate a prediction of the next sentence or abstractive
text like titles or topics. In this paper, we study the use of LLM-generated
context information and propose an approach to distill the generated
information during fine-tuning of self-supervised speech models, which we refer
to as generative context-aware fine-tuning. This approach allows the fine-tuned
model to make improved predictions without access to the true surrounding
segments or to the LLM at inference time, while requiring only a very small
additional context module. We evaluate the proposed approach using the SLUE and
Libri-light benchmarks for several downstream tasks: automatic speech
recognition, named entity recognition, and sentiment analysis. The results show
that generative context-aware fine-tuning outperforms a context injection
fine-tuning approach that accesses the ground-truth previous text, and is
competitive with a generative context injection fine-tuning approach that
requires the LLM at inference time.
Related papers
- Prompting Large Language Models with Audio for General-Purpose Speech Summarization [13.415189715216354]
We introduce a framework for speech summarization that leverages the processing and reasoning capabilities of large language models (LLMs)
We propose an end-to-end system that combines an instruction-tuned LLM with an audio encoder that converts speech into token representations that the LLM can interpret.
arXiv Detail & Related papers (2024-06-10T02:04:28Z) - Peering into the Mind of Language Models: An Approach for Attribution in Contextual Question Answering [9.86691461253151]
We introduce a novel method for attribution in contextual question answering, leveraging the hidden state representations of large language models (LLMs)
Our approach bypasses the need for extensive model retraining and retrieval model overhead, offering granular attributions and preserving the quality of generated answers.
We present Verifiability-granular, an attribution dataset which has token level annotations for LLM generations in the contextual question answering setup.
arXiv Detail & Related papers (2024-05-28T09:12:44Z) - Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing [56.71450690166821]
We propose a novel framework, namely Visual Speech Processing incorporated with LLMs (VSP-LLM)
VSP-LLM is designed to perform multi-tasks of visual speech recognition and translation.
We show that VSP-LLM trained on just 30 hours of labeled data can more effectively translate lip movements.
arXiv Detail & Related papers (2024-02-23T07:21:32Z) - LLM-augmented Preference Learning from Natural Language [19.700169351688768]
Large Language Models (LLMs) are equipped to deal with larger context lengths.
LLMs can consistently outperform the SotA when the target text is large.
Few-shot learning yields better performance than zero-shot learning.
arXiv Detail & Related papers (2023-10-12T17:17:27Z) - Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model
in End-to-End Speech Recognition [26.043533280932603]
We present a novel integration of an instruction-tuned large language model (LLM) and end-to-end automatic speech recognition (ASR)
We explore using this zero-shot capability of LLMs to extract linguistic information that can contribute to improving ASR performance.
arXiv Detail & Related papers (2023-09-19T11:10:50Z) - Context-Aware Prompt Tuning for Vision-Language Model with
Dual-Alignment [15.180715595425864]
We introduce a novel method to improve the prompt learning of vision-language models by incorporating pre-trained large language models (LLMs)
With DuAl-PT, we propose to learn more context-aware prompts, benefiting from both explicit and implicit context modeling.
Empirically, DuAl-PT achieves superior performance on 11 downstream datasets on few-shot recognition and base-to-new generalization.
arXiv Detail & Related papers (2023-09-08T06:51:15Z) - Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs.
Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z) - Context-aware Fine-tuning of Self-supervised Speech Models [56.95389222319555]
We study the use of context, i.e., surrounding segments, during fine-tuning.
We propose a new approach called context-aware fine-tuning.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks.
arXiv Detail & Related papers (2022-12-16T15:46:15Z) - An Exploration of Prompt Tuning on Generative Spoken Language Model for
Speech Processing Tasks [112.1942546460814]
We report the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM)
Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models.
arXiv Detail & Related papers (2022-03-31T03:26:55Z) - How Context Affects Language Models' Factual Predictions [134.29166998377187]
We integrate information from a retrieval system with a pre-trained language model in a purely unsupervised way.
We report that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline.
arXiv Detail & Related papers (2020-05-10T09:28:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.