KIT's Offline Speech Translation and Instruction Following Submission for IWSLT 2025
- URL: http://arxiv.org/abs/2505.13036v1
- Date: Mon, 19 May 2025 12:21:29 GMT
- Title: KIT's Offline Speech Translation and Instruction Following Submission for IWSLT 2025
- Authors: Sai Koneru, Maike Züfle, Thai-Binh Nguyen, Seymanur Akti, Jan Niehues, Alexander Waibel,
- Abstract summary: We present the Karlsruhe Institute of Technology's submissions for the Offline ST and Instruction Following (IF) tracks.<n>We propose a pipeline that employs multiple automatic speech recognition systems, whose outputs are fused using an LLM with document-level context.<n>For the IF track, we develop an end-to-end model that integrates a speech encoder with an LLM to perform a wide range of instruction-following tasks.
- Score: 56.61209412965054
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The scope of the International Workshop on Spoken Language Translation (IWSLT) has recently broadened beyond traditional Speech Translation (ST) to encompass a wider array of tasks, including Speech Question Answering and Summarization. This shift is partly driven by the growing capabilities of modern systems, particularly with the success of Large Language Models (LLMs). In this paper, we present the Karlsruhe Institute of Technology's submissions for the Offline ST and Instruction Following (IF) tracks, where we leverage LLMs to enhance performance across all tasks. For the Offline ST track, we propose a pipeline that employs multiple automatic speech recognition systems, whose outputs are fused using an LLM with document-level context. This is followed by a two-step translation process, incorporating additional refinement step to improve translation quality. For the IF track, we develop an end-to-end model that integrates a speech encoder with an LLM to perform a wide range of instruction-following tasks. We complement it with a final document-level refinement stage to further enhance output quality by using contextual information.
Related papers
- Speech Translation Refinement using Large Language Models [8.602429274223693]
This paper investigates how large language models (LLMs) can improve the performance of speech translation by introducing a joint refinement process.<n>Through the joint refinement of speech translation (ST) and automatic speech recognition (ASR) transcription via LLMs, the performance of the ST model is significantly improved.<n> Experimental results on the MuST-C and CoVoST 2 datasets, which include seven translation tasks, demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2025-01-25T05:32:42Z) - Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation [14.746190461312036]
Large language models (LLMs) have revolutionized natural language processing (NLP)
We introduce a text-to-speech (TTS) system powered by a fine-tuned Llama model, named TTS-Llama, that achieves state-of-the-art speech synthesis performance.
We further propose MoLE-Llama, a text-and-speech multimodal LLM developed through purely late-fusion parameter-efficient fine-tuning (PEFT) and a mixture-of-expert architecture.
arXiv Detail & Related papers (2024-10-27T04:28:57Z) - DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data [84.01401439030265]
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs)<n>We present a simple yet effective automatic process for creating speech-text pair data.<n>Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data.
arXiv Detail & Related papers (2024-09-30T07:01:21Z) - Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z) - TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - Speech Translation with Large Language Models: An Industrial Practice [64.5419534101104]
We introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained large language model (LLM)
By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations.
Through rigorous experimentation on English and Chinese datasets, we showcase the exceptional performance of LLM-ST.
arXiv Detail & Related papers (2023-12-21T05:32:49Z) - The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline
Shared Task [92.5087402621697]
This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task.
The YiTrans system is built on large-scale pre-trained encoder-decoder models.
Our final submissions rank first on English-German and English-Chinese end-to-end systems in terms of the automatic evaluation metric.
arXiv Detail & Related papers (2022-06-12T16:13:01Z) - FST: the FAIR Speech Translation System for the IWSLT21 Multilingual
Shared Task [36.51221186190272]
We describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign.
Our system is built by leveraging transfer learning across modalities, tasks and languages.
arXiv Detail & Related papers (2021-07-14T19:43:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.