Related papers: Contrastive Learning for Task-Independent SpeechLLM-Pretraining

Contrastive Learning for Task-Independent SpeechLLM-Pretraining

URL: http://arxiv.org/abs/2412.15712v1
Date: Fri, 20 Dec 2024 09:33:31 GMT
Title: Contrastive Learning for Task-Independent SpeechLLM-Pretraining
Authors: Maike Züfle, Jan Niehues,
Abstract summary: Large language models (LLMs) excel in natural language processing.<n>Direct task-specific fine-tuning is limited by overfitting risks, data requirements, and computational costs.<n>We propose a scalable, two-stage training approach.
Score: 14.531386555183596
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) excel in natural language processing but adapting these LLMs to speech processing tasks efficiently is not straightforward. Direct task-specific fine-tuning is limited by overfitting risks, data requirements, and computational costs. To address these challenges, we propose a scalable, two-stage training approach: (1) A task-independent speech pretraining stage using contrastive learning to align text and speech representations over all layers, followed by (2) a task-specific fine-tuning stage requiring minimal data. This approach outperforms traditional ASR pretraining and enables the model to surpass models specialized on speech translation and question answering while being trained on only 10% of the task-specific data.

Related papers

SpeechMapper: Speech-to-text Embedding Projector for LLMs [8.608235759695287]
SpeechMapper is a cost-efficient speech-to-LLM-embedding training approach.<n>It mitigates overfitting, enabling more robust and generalizable models.
arXiv Detail & Related papers (2026-01-28T09:22:58Z)
AzeroS: Extending LLM to Speech with Self-Generated Instruction-Free Tuning [49.68129589035101]
We introduce AZeroS (Auden Zero-instruction-tuned Speech-LLM), which is trained on speech-text pairs derived from publicly available corpora.<n>AZeroS achieves state-of-the-art performance on both semantic and paralinguistic benchmarks.
arXiv Detail & Related papers (2025-12-31T04:05:04Z)
Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data [84.01401439030265]
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs) We present a simple yet effective automatic process for creating speech-text pair data. Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data.
arXiv Detail & Related papers (2024-09-30T07:01:21Z)
SpeechVerse: A Large-scale Generalizable Audio Language Model [38.67969337605572]
SpeechVerse is a robust multi-task training and curriculum learning framework. It combines pre-trained speech and text foundation models via a small set of learnable parameters. Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks.
arXiv Detail & Related papers (2024-05-14T03:33:31Z)
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models. Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z)
Bridging the Gap between Language Models and Cross-Lingual Sequence Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks. Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages. In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap. Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z)
An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks [112.1942546460814]
We report the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM) Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models.
arXiv Detail & Related papers (2022-03-31T03:26:55Z)
AdaPrompt: Adaptive Model Training for Prompt-based NLP [77.12071707955889]
We propose AdaPrompt, adaptively retrieving external data for continual pretraining of PLMs. Experimental results on five NLP benchmarks show that AdaPrompt can improve over standard PLMs in few-shot settings. In zero-shot settings, our method outperforms standard prompt-based methods by up to 26.35% relative error reduction.
arXiv Detail & Related papers (2022-02-10T04:04:57Z)
SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training [33.02912456062474]
We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech. We demonstrate that incorporating both speech and text data during pre-training can significantly improve downstream quality on CoVoST2 speech translation.
arXiv Detail & Related papers (2021-10-20T00:59:36Z)
Task-specific Objectives of Pre-trained Language Models for Dialogue Adaptation [79.0866650271659]
Common process of utilizing PrLMs is first pre-training on large-scale general corpora with task-independent LM training objectives, then fine-tuning on task datasets with task-specific training objectives. We introduce task-specific pre-training on in-domain task-related corpora with task-specific objectives. This procedure is placed between the original two stages to enhance the model understanding capacity of specific tasks.
arXiv Detail & Related papers (2020-09-10T16:46:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.