Related papers: Word Order in English-Japanese Simultaneous Interpretation: Analyses and Evaluation using Chunk-wise Monotonic Translation

Word Order in English-Japanese Simultaneous Interpretation: Analyses and Evaluation using Chunk-wise Monotonic Translation

URL: http://arxiv.org/abs/2406.08940v2
Date: Mon, 15 Jul 2024 10:42:02 GMT
Title: Word Order in English-Japanese Simultaneous Interpretation: Analyses and Evaluation using Chunk-wise Monotonic Translation
Authors: Kosuke Doi, Yuka Ko, Mana Makinae, Katsuhito Sudoh, Satoshi Nakamura,
Abstract summary: This paper analyzes the features of monotonic translations, which follow the word order of the source language, in simultaneous interpreting (SI) We analyzed the characteristics of chunk-wise monotonic translation (CMT) sentences using the NAIST English-to-Japanese Chunk-wise Monotonic Translation Evaluation dataset. We further investigated the features of CMT sentences by evaluating the output from the existing speech translation (ST) and simultaneous speech translation (simulST) models on the NAIST English-to-Japanese Chunk-wise Monotonic Translation Evaluation dataset.
Score: 13.713981533436135
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper analyzes the features of monotonic translations, which follow the word order of the source language, in simultaneous interpreting (SI). Word order differences are one of the biggest challenges in SI, especially for language pairs with significant structural differences like English and Japanese. We analyzed the characteristics of chunk-wise monotonic translation (CMT) sentences using the NAIST English-to-Japanese Chunk-wise Monotonic Translation Evaluation Dataset and identified some grammatical structures that make monotonic translation difficult in English-Japanese SI. We further investigated the features of CMT sentences by evaluating the output from the existing speech translation (ST) and simultaneous speech translation (simulST) models on the NAIST English-to-Japanese Chunk-wise Monotonic Translation Evaluation Dataset as well as on existing test sets. The results indicate the possibility that the existing SI-based test set underestimates the model performance. The results also suggest that using CMT sentences as references gives higher scores to simulST models than ST models, and that using an offline-based test set to evaluate the simulST models underestimates the model performance.

Related papers

Intrinsic vs. Extrinsic Evaluation of Czech Sentence Embeddings: Semantic Relevance Doesn't Help with MT Evaluation [0.0]
In this paper, we compare Czech-specific and multilingual sentence embedding models through intrinsic and extrinsic evaluation paradigms.<n>For intrinsic evaluation, we employ Costra, a complex sentence transformation dataset, and several Semantic Textual Similarity (STS) benchmarks to assess the ability of the embeddings to capture linguistic phenomena.<n>In the extrinsic evaluation, we fine-tune each embedding model using COMET-based metrics for machine translation evaluation.
arXiv Detail & Related papers (2025-06-25T07:46:17Z)
Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data [64.4458540273004]
We propose a self-play framework that leverages only monolingual data and the intrinsic multilingual knowledge of Large Language Models (LLMs) Experiments demonstrate that this approach not only matches the performance of models trained on large-scale parallel data but also excels in non-English translation directions.
arXiv Detail & Related papers (2025-04-20T16:20:30Z)
Context-Aware Machine Translation with Source Coreference Explanation [26.336947440529713]
We propose a model that explains the decisions made for translation by predicting coreference features in the input. We evaluate our method in the WMT document-level translation task of English-German dataset, the English-Russian dataset, and the multilingual TED talk dataset.
arXiv Detail & Related papers (2024-04-30T12:41:00Z)
Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data [16.05089716626287]
We propose an effective way to train a SimulST model using mixed data of SI and offline. Experiment results show improvements of BLEURT in different latency ranges.
arXiv Detail & Related papers (2023-06-14T15:42:06Z)
Decomposed Prompting for Machine Translation Between Related Languages using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations. We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z)
Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models. We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks. OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z)
Rethinking the Reasonability of the Test Set for Simultaneous Machine Translation [14.758033756564858]
Simultaneous machine translation (SimulMT) models start translation before the end of the source sentence. General full-sentence translation test set is acquired by offline translation of the entire source sentence. We manually annotate a monotonic test set based on the MuST-C English-Chinese test set, denoted as SiMuST-C.
arXiv Detail & Related papers (2023-03-02T05:06:44Z)
Improving Simultaneous Machine Translation with Monolingual Data [94.1085601198393]
Simultaneous machine translation (SiMT) is usually done via sequence-level knowledge distillation (Seq-KD) from a full-sentence neural machine translation (NMT) model. We propose to leverage monolingual data to improve SiMT, which trains a SiMT student on the combination of bilingual data and external monolingual data distilled by Seq-KD.
arXiv Detail & Related papers (2022-12-02T14:13:53Z)
Compositional Evaluation on Japanese Textual Entailment and Similarity [20.864082353441685]
Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models. Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclusively on English. There are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English.
arXiv Detail & Related papers (2022-08-09T15:10:56Z)
Anticipation-free Training for Simultaneous Translation [70.85761141178597]
Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available. Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality. We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
arXiv Detail & Related papers (2022-01-30T16:29:37Z)
It is Not as Good as You Think! Evaluating Simultaneous Machine Translation on Interpretation Data [58.105938143865906]
We argue that SiMT systems should be trained and tested on real interpretation data. Our results highlight the difference of up-to 13.83 BLEU score when SiMT models are evaluated on translation vs interpretation data.
arXiv Detail & Related papers (2021-10-11T12:27:07Z)
ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee. It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z)
Translating the Unseen? Yor\`ub\'a $\rightarrow$ English MT in Low-Resource, Morphologically-Unmarked Settings [8.006185289499049]
Translating between languages where certain features are marked morphologically in one but absent or marked contextually in the other is an important test case for machine translation. In this work, we perform fine-grained analysis on how an SMT system compares with two NMT systems when translating bare nouns in Yorub'a into English.
arXiv Detail & Related papers (2021-03-07T01:24:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.