A Technical Report: BUT Speech Translation Systems
- URL: http://arxiv.org/abs/2010.11593v1
- Date: Thu, 22 Oct 2020 10:52:31 GMT
- Title: A Technical Report: BUT Speech Translation Systems
- Authors: Hari Krishna Vydana, Lukas Burget, Jan Cernocky
- Abstract summary: The paper describes the BUT's speech translation systems.
The systems are English$longrightarrow$German offline speech translation systems.
A large degradation is observed when translating ASR hypothesis compared to the oracle input text.
- Score: 2.9327503320877457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The paper describes the BUT's speech translation systems. The systems are
English$\longrightarrow$German offline speech translation systems. The systems
are based on our previous works \cite{Jointly_trained_transformers}. Though
End-to-End and cascade~(ASR-MT) spoken language translation~(SLT) systems are
reaching comparable performances, a large degradation is observed when
translating ASR hypothesis compared to the oracle input text. To reduce this
performance degradation, we have jointly-trained ASR and MT modules with ASR
objective as an auxiliary loss. Both the networks are connected through the
neural hidden representations. This model has an End-to-End differentiable path
with respect to the final objective function and also utilizes the ASR
objective for better optimization. During the inference both the modules(i.e.,
ASR and MT) are connected through the hidden representations corresponding to
the n-best hypotheses. Ensembling with independently trained ASR and MT models
have further improved the performance of the system.
Related papers
- Towards interfacing large language models with ASR systems using confidence measures and prompting [54.39667883394458]
This work investigates post-hoc correction of ASR transcripts with large language models (LLMs)
To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods.
Our results indicate that this can improve the performance of less competitive ASR systems.
arXiv Detail & Related papers (2024-07-31T08:00:41Z) - Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z) - Attention-based Multi-hypothesis Fusion for Speech Summarization [83.04957603852571]
Speech summarization can be achieved by combining automatic speech recognition (ASR) and text summarization (TS)
ASR errors directly affect the quality of the output summary in the cascade approach.
We propose a cascade speech summarization model that is robust to ASR errors and that exploits multiple hypotheses generated by ASR to attenuate the effect of ASR errors on the summary.
arXiv Detail & Related papers (2021-11-16T03:00:29Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - The IWSLT 2021 BUT Speech Translation Systems [2.4373900721120285]
BUT's English to German offline speech translation(ST) systems developed for IWSLT 2021.
They are based on jointly trained Automatic Speech Recognition-Machine Translation models.
Their performances is evaluated on MustC-Common test set.
arXiv Detail & Related papers (2021-07-13T15:11:18Z) - Cascaded Models With Cyclic Feedback For Direct Speech Translation [14.839931533868176]
We present a technique that allows cascades of automatic speech recognition (ASR) and machine translation (MT) to exploit in-domain direct speech translation data.
A comparison to end-to-end speech translation using components of identical architecture and the same data shows gains of up to 3.8 BLEU points on LibriVoxDeEn and up to 5.1 BLEU points on CoVoST for German-to-English speech translation.
arXiv Detail & Related papers (2020-10-21T17:18:51Z) - Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations.
In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z) - Jointly Trained Transformers models for Spoken Language Translation [2.3886615435250302]
This work trains SLT systems with ASR objective as an auxiliary loss and both the networks are connected through neural hidden representations.
This architecture has improved from BLEU from 36.8 to 44.5.
All the experiments are reported on English-Portuguese speech translation task using How2 corpus.
arXiv Detail & Related papers (2020-04-25T11:28:39Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.