Related papers: A Technical Report: BUT Speech Translation Systems

A Technical Report: BUT Speech Translation Systems

URL: http://arxiv.org/abs/2010.11593v1
Date: Thu, 22 Oct 2020 10:52:31 GMT
Title: A Technical Report: BUT Speech Translation Systems
Authors: Hari Krishna Vydana, Lukas Burget, Jan Cernocky
Abstract summary: The paper describes the BUT's speech translation systems. The systems are English$longrightarrow$German offline speech translation systems. A large degradation is observed when translating ASR hypothesis compared to the oracle input text.
Score: 2.9327503320877457
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The paper describes the BUT's speech translation systems. The systems are English$\longrightarrow$German offline speech translation systems. The systems are based on our previous works \cite{Jointly_trained_transformers}. Though End-to-End and cascade~(ASR-MT) spoken language translation~(SLT) systems are reaching comparable performances, a large degradation is observed when translating ASR hypothesis compared to the oracle input text. To reduce this performance degradation, we have jointly-trained ASR and MT modules with ASR objective as an auxiliary loss. Both the networks are connected through the neural hidden representations. This model has an End-to-End differentiable path with respect to the final objective function and also utilizes the ASR objective for better optimization. During the inference both the modules(i.e., ASR and MT) are connected through the hidden representations corresponding to the n-best hypotheses. Ensembling with independently trained ASR and MT models have further improved the performance of the system.

Related papers

When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation [38.88908101517807]
We explore the benefits of incorporating multiple candidates from ASR and self-supervised speech features into machine translation. Our analysis reveals that the primary cause of cascading errors stems from the increased divergence between similar samples in the speech domain when mapped to the text domain.
arXiv Detail & Related papers (2025-02-01T09:29:21Z)
Towards interfacing large language models with ASR systems using confidence measures and prompting [54.39667883394458]
This work investigates post-hoc correction of ASR transcripts with large language models (LLMs) To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods. Our results indicate that this can improve the performance of less competitive ASR systems.
arXiv Detail & Related papers (2024-07-31T08:00:41Z)
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z)
Attention-based Multi-hypothesis Fusion for Speech Summarization [83.04957603852571]
Speech summarization can be achieved by combining automatic speech recognition (ASR) and text summarization (TS) ASR errors directly affect the quality of the output summary in the cascade approach. We propose a cascade speech summarization model that is robust to ASR errors and that exploits multiple hypotheses generated by ASR to attenuate the effect of ASR errors on the summary.
arXiv Detail & Related papers (2021-11-16T03:00:29Z)
The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task. We trained our models with the officially provided ASR and MT datasets. To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z)
The IWSLT 2021 BUT Speech Translation Systems [2.4373900721120285]
BUT's English to German offline speech translation(ST) systems developed for IWSLT 2021. They are based on jointly trained Automatic Speech Recognition-Machine Translation models. Their performances is evaluated on MustC-Common test set.
arXiv Detail & Related papers (2021-07-13T15:11:18Z)
Cascaded Models With Cyclic Feedback For Direct Speech Translation [14.839931533868176]
We present a technique that allows cascades of automatic speech recognition (ASR) and machine translation (MT) to exploit in-domain direct speech translation data. A comparison to end-to-end speech translation using components of identical architecture and the same data shows gains of up to 3.8 BLEU points on LibriVoxDeEn and up to 5.1 BLEU points on CoVoST for German-to-English speech translation.
arXiv Detail & Related papers (2020-10-21T17:18:51Z)
Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations. In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z)
Jointly Trained Transformers models for Spoken Language Translation [2.3886615435250302]
This work trains SLT systems with ASR objective as an auxiliary loss and both the networks are connected through neural hidden representations. This architecture has improved from BLEU from 36.8 to 44.5. All the experiments are reported on English-Portuguese speech translation task using How2 corpus.
arXiv Detail & Related papers (2020-04-25T11:28:39Z)
Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU) We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.