BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine
Translation
- URL: http://arxiv.org/abs/2403.03521v1
- Date: Wed, 6 Mar 2024 08:02:21 GMT
- Title: BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine
Translation
- Authors: Carinne Cherf, Yuval Pinter
- Abstract summary: We propose a bidirectional semantic-based evaluation method designed to assess the sense distance of the translation from the source text.
This approach employs the comprehensive multilingual encyclopedic dictionary BabelNet.
Factual analysis shows a strong correlation between the average evaluation scores generated by our method and the human assessments across various machine translation systems for English-German language pair.
- Score: 4.651581292181871
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Neural machine translation (NMT) has progressed rapidly in the past few
years, promising improvements and quality translations for different languages.
Evaluation of this task is crucial to determine the quality of the translation.
Overall, insufficient emphasis is placed on the actual sense of the translation
in traditional methods. We propose a bidirectional semantic-based evaluation
method designed to assess the sense distance of the translation from the source
text. This approach employs the comprehensive multilingual encyclopedic
dictionary BabelNet. Through the calculation of the semantic distance between
the source and its back translation of the output, our method introduces a
quantifiable approach that empowers sentence comparison on the same linguistic
level. Factual analysis shows a strong correlation between the average
evaluation scores generated by our method and the human assessments across
various machine translation systems for English-German language pair. Finally,
our method proposes a new multilingual approach to rank MT systems without the
need for parallel corpora.
Related papers
- Cross-lingual neural fuzzy matching for exploiting target-language
monolingual corpora in computer-aided translation [0.0]
In this paper, we introduce a novel neural approach aimed at exploiting in-domain target-language (TL) monolingual corpora.
Our approach relies on cross-lingual sentence embeddings to retrieve translation proposals from TL monolingual corpora, and on a neural model to estimate their post-editing effort.
The paper presents an automatic evaluation of these techniques on four language pairs that shows that our approach can successfully exploit monolingual texts in a TM-based CAT environment.
arXiv Detail & Related papers (2024-01-16T14:00:28Z) - Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - Iterative Translation Refinement with Large Language Models [25.90607157524168]
We propose iteratively prompting a large language model to self-correct a translation.
We also discuss the challenges in evaluation and relation to human performance and translationese.
arXiv Detail & Related papers (2023-06-06T16:51:03Z) - Revisiting Machine Translation for Cross-lingual Classification [91.43729067874503]
Most research in the area focuses on the multilingual models rather than the Machine Translation component.
We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed.
arXiv Detail & Related papers (2023-05-23T16:56:10Z) - Consistent Human Evaluation of Machine Translation across Language Pairs [21.81895199744468]
We propose a new metric called XSTS that is more focused on semantic equivalence and a cross-lingual calibration method.
We demonstrate the effectiveness of these novel contributions in large scale evaluation studies across up to 14 language pairs.
arXiv Detail & Related papers (2022-05-17T17:57:06Z) - Quality Estimation Using Round-trip Translation with Sentence Embeddings [0.0]
We revisit round-trip translation, proposing a system which aims to solve the previous pitfalls found with the approach.
Our method makes use of recent advances in language representation learning to more accurately gauge the similarity between the original and round-trip sentences.
arXiv Detail & Related papers (2021-10-31T17:51:12Z) - Modelling Latent Translations for Cross-Lingual Transfer [47.61502999819699]
We propose a new technique that integrates both steps of the traditional pipeline (translation and classification) into a single model.
We evaluate our novel latent translation-based model on a series of multilingual NLU tasks.
We report gains for both zero-shot and few-shot learning setups, up to 2.7 accuracy points on average.
arXiv Detail & Related papers (2021-07-23T17:11:27Z) - Decoding and Diversity in Machine Translation [90.33636694717954]
We characterize differences between cost diversity paid for the BLEU scores enjoyed by NMT.
Our study implicates search as a salient source of known bias when translating gender pronouns.
arXiv Detail & Related papers (2020-11-26T21:09:38Z) - On the Limitations of Cross-lingual Encoders as Exposed by
Reference-Free Machine Translation Evaluation [55.02832094101173]
Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual similarity.
This paper concerns ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations.
We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.
We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations.
arXiv Detail & Related papers (2020-05-03T22:10:23Z) - Learning Contextualized Sentence Representations for Document-Level
Neural Machine Translation [59.191079800436114]
Document-level machine translation incorporates inter-sentential dependencies into the translation of a source sentence.
We propose a new framework to model cross-sentence dependencies by training neural machine translation (NMT) to predict both the target translation and surrounding sentences of a source sentence.
arXiv Detail & Related papers (2020-03-30T03:38:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.