Related papers: Comprehension of Subtitles from Re-Translating Simultaneous Speech Translation

Comprehension of Subtitles from Re-Translating Simultaneous Speech Translation

URL: http://arxiv.org/abs/2203.02458v1
Date: Fri, 4 Mar 2022 17:41:39 GMT
Title: Comprehension of Subtitles from Re-Translating Simultaneous Speech Translation
Authors: D\'avid Javorsk\'y, Dominik Mach\'a\v{c}ek, Ond\v{r}ej Bojar
Abstract summary: In simultaneous speech translation, one can vary the size of the output window, system latency and sometimes the allowed level of rewriting. The effect of these properties on readability and comprehensibility has not been tested with modern neural translation systems. It is a pilot study with 14 users on 2 hours of German documentaries or speeches with online translations into Czech.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In simultaneous speech translation, one can vary the size of the output window, system latency and sometimes the allowed level of rewriting. The effect of these properties on readability and comprehensibility has not been tested with modern neural translation systems. In this work, we propose an evaluation method and investigate the effects on comprehension and user preferences. It is a pilot study with 14 users on 2 hours of German documentaries or speeches with online translations into Czech. We collect continuous feedback and answers on factual questions. Our results show that the subtitling layout or flicker have a little effect on comprehension, in contrast to machine translation itself and individual competence. Other results show that users with a limited knowledge of the source language have different preferences to stability and latency than the users with zero knowledge. The results are statistically insignificant, however, we show that our method works and can be reproduced in larger volume.

Related papers

Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models [49.1574468325115]
We introduce Speech-IFeval, an evaluation framework designed to assess instruction-following capabilities.<n>Recent SLMs integrate speech perception with large language models (LLMs), often degrading textual capabilities due to speech-centric training.<n>Our findings show that most SLMs struggle with even basic instructions, performing far worse than text-based LLMs.
arXiv Detail & Related papers (2025-05-25T08:37:55Z)
Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering [68.3400058037817]
We introduce TREQA (Translation Evaluation via Question-Answering), a framework that extrinsically evaluates translation quality. We show that TREQA is competitive with and, in some cases, outperforms state-of-the-art neural and LLM-based metrics in ranking alternative paragraph-level translations.
arXiv Detail & Related papers (2025-04-10T09:24:54Z)
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions [45.34333059156364]
We introduce QualiSpeech, a comprehensive low-level speech quality assessment dataset. We also propose the QualiSpeech Benchmark to evaluate the low-level speech understanding capabilities of auditory large language models.
arXiv Detail & Related papers (2025-03-26T07:32:20Z)
Evaluating Text Style Transfer Evaluation: Are There Any Reliable Metrics? [9.234136424254261]
Text style transfer (TST) is the task of transforming a text to reflect a particular style while preserving its original content. Using human evaluation is ideal but costly, as is common in other natural language processing (NLP) tasks. In this paper, we examine both set of existing and novel metrics from broader NLP tasks for TST evaluation.
arXiv Detail & Related papers (2025-02-07T07:39:17Z)
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions [73.51087998971418]
evaluating machine-generated audio captions is a complex task that requires considering diverse factors. We propose CLAIR-A, a simple and flexible method that leverages the zero-shot capabilities of large language models. In our evaluations, CLAIR-A better predicts human judgements of quality compared to traditional metrics.
arXiv Detail & Related papers (2024-09-19T17:59:52Z)
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems [57.16442740983528]
Crowdsourced labels play a crucial role in evaluating task-oriented dialogue systems. Previous studies suggest using only a portion of the dialogue context in the annotation process. This study investigates the influence of dialogue context on annotation quality.
arXiv Detail & Related papers (2024-04-15T17:56:39Z)
Crowdsourced Multilingual Speech Intelligibility Testing [0.0]
We propose an approach for a crowdsourced intelligibility assessment. Standards and recommendations are yet to be defined. We detail the test design, the collection and public release of the multilingual speech data, and the results of our early experiments.
arXiv Detail & Related papers (2024-03-21T20:14:53Z)
Is Context Helpful for Chat Translation Evaluation? [23.440392979857247]
We conduct a meta-evaluation of existing sentence-level automatic metrics to assess the quality of machine-translated chats. We find that reference-free metrics lag behind reference-based ones, especially when evaluating translation quality in out-of-English settings. We propose a new evaluation metric, Context-MQM, that utilizes bilingual context with a large language model.
arXiv Detail & Related papers (2024-03-13T07:49:50Z)
DiariST: Streaming Speech Translation with Speaker Diarization [53.595990270899414]
We propose DiariST, the first streaming ST and SD solution. It is built upon a neural transducer-based streaming ST system and integrates token-level serialized output training and t-vector. Our system achieves a strong ST and SD capability compared to offline systems based on Whisper, while performing streaming inference for overlapping speech.
arXiv Detail & Related papers (2023-09-14T19:33:27Z)
BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric [66.73705349465207]
End-to-end speech-to-speech translation (S2ST) is generally evaluated with text-based metrics. We propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems.
arXiv Detail & Related papers (2022-12-16T14:00:26Z)
Competency-Aware Neural Machine Translation: Can Machine Translation Know its Own Translation Quality? [61.866103154161884]
Neural machine translation (NMT) is often criticized for failures that happen without awareness. We propose a novel competency-aware NMT by extending conventional NMT with a self-estimator. We show that the proposed method delivers outstanding performance on quality estimation.
arXiv Detail & Related papers (2022-11-25T02:39:41Z)
MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation [10.132491257235024]
We conduct an extensive correlation analysis of Continuous Ratings (CR) and offline machine translation evaluation metrics. Our study reveals that the offline metrics are well correlated with CR and can be reliably used for evaluating machine translation in simultaneous mode. We conclude that given the current quality levels of SST, these metrics can be used as proxies for CR, alleviating the need for large scale human evaluation.
arXiv Detail & Related papers (2022-11-16T03:03:56Z)
ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee. It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z)
Towards the evaluation of simultaneous speech translation from a communicative perspective [0.0]
We present the results of an experiment aimed at evaluating the quality of a simultaneous speech translation engine. We found better performance for the human interpreters in terms of intelligibility, while the machine performs slightly better in terms of informativeness.
arXiv Detail & Related papers (2021-03-15T13:09:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.