LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models
- URL: http://arxiv.org/abs/2506.04586v1
- Date: Thu, 05 Jun 2025 03:00:04 GMT
- Title: LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models
- Authors: Wen Ding, Fan Qian,
- Abstract summary: We introduce a versatile framework that leverages Large Language Models (LLMs) to correct pseudo labels generated from in-the-wild data.<n>Within the LESS framework, pseudo-labeled text from Automatic Speech Recognition (ASR) or Automatic Speech Translation (AST) of the unsupervised data is refined by an LLM.<n> Experiments on both Mandarin ASR and Spanish-to-English AST tasks show that LESS achieves a notable absolute WER reduction of 3.77%.
- Score: 3.5297361401370053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce LESS (Large Language Model Enhanced Semi-supervised Learning), a versatile framework that leverages Large Language Models (LLMs) to correct pseudo labels generated from in-the-wild data. Within the LESS framework, pseudo-labeled text from Automatic Speech Recognition (ASR) or Automatic Speech Translation (AST) of the unsupervised data is refined by an LLM, and augmented by a data filtering strategy to optimize LLM knowledge transfer efficiency. Experiments on both Mandarin ASR and Spanish-to-English AST tasks show that LESS achieves a notable absolute WER reduction of 3.77% on the Wenet Speech test set, as well as BLEU scores of 34.0 and 64.7 on Callhome and Fisher test sets respectively. These results validate the adaptability of LESS across different languages, tasks, and domains. Ablation studies conducted with various LLMs and prompt configurations provide novel insights into leveraging LLM-derived knowledge for speech processing applications.
Related papers
- In-context Language Learning for Endangered Languages in Speech Recognition [15.294500162002345]
We investigate whether large language models (LLMs) can learn unseen, low-resource languages through in-context learning (ICL)<n>We show ICL enables LLMs to achieve ASR performance that is comparable to or even surpasses dedicated language models trained specifically for these languages.
arXiv Detail & Related papers (2025-05-26T18:38:59Z) - On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation [7.478369203246005]
Retrieval-augmented generation (RAG) with large language models (LLMs) has demonstrated strong performance in multilingual question-answering tasks.<n>In multilingual RAG, retrieved passages can be written in languages other than that of the query entered by the user.
arXiv Detail & Related papers (2025-04-01T09:55:23Z) - Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.<n>Currently, instruction-tuned large language models (LLMs) excel at various English tasks.<n>Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data [84.01401439030265]
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs)<n>We present a simple yet effective automatic process for creating speech-text pair data.<n>Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data.
arXiv Detail & Related papers (2024-09-30T07:01:21Z) - Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT [9.682499180341273]
Large language models (LLMs) have significantly advanced text generation, but the human-like quality of their outputs presents major challenges.<n>We propose CUDRT, a comprehensive evaluation framework and bilingual benchmark in Chinese and English.<n>This framework supports scalable, reproducible experiments and enables analysis of how operational diversity, multilingual training sets, and LLM architectures influence detection performance.
arXiv Detail & Related papers (2024-06-13T12:43:40Z) - Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG.
InFO-RAG is low-cost and general across various tasks.
It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Speech Translation with Large Language Models: An Industrial Practice [64.5419534101104]
We introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained large language model (LLM)
By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations.
Through rigorous experimentation on English and Chinese datasets, we showcase the exceptional performance of LLM-ST.
arXiv Detail & Related papers (2023-12-21T05:32:49Z) - Exploring the Integration of Large Language Models into Automatic Speech
Recognition Systems: An Empirical Study [0.0]
This paper explores the integration of Large Language Models (LLMs) into Automatic Speech Recognition (ASR) systems.
Our primary focus is to investigate the potential of using an LLM's in-context learning capabilities to enhance the performance of ASR systems.
arXiv Detail & Related papers (2023-07-13T02:31:55Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z) - Multilingual Speech Recognition using Knowledge Transfer across Learning
Processes [15.927513451432946]
Experimental results reveal the best pre-training strategy resulting in 3.55% relative reduction in overall WER.
A combination of LEAP and SSL yields 3.51% relative reduction in overall WER when using language ID.
arXiv Detail & Related papers (2021-10-15T07:50:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.