NMTScore: A Multilingual Analysis of Translation-based Text Similarity
Measures
- URL: http://arxiv.org/abs/2204.13692v1
- Date: Thu, 28 Apr 2022 17:57:17 GMT
- Title: NMTScore: A Multilingual Analysis of Translation-based Text Similarity
Measures
- Authors: Jannis Vamvas and Rico Sennrich
- Abstract summary: We analyze translation-based similarity measures in the common framework of multilingual NMT.
Compared to baselines such as sentence embeddings, translation-based measures prove competitive in paraphrase identification.
Measures show a relatively high correlation to human judgments.
- Score: 42.46681912294797
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Being able to rank the similarity of short text segments is an interesting
bonus feature of neural machine translation. Translation-based similarity
measures include direct and pivot translation probability, as well as
translation cross-likelihood, which has not been studied so far. We analyze
these measures in the common framework of multilingual NMT, releasing the
NMTScore library (available at https://github.com/ZurichNLP/nmtscore). Compared
to baselines such as sentence embeddings, translation-based measures prove
competitive in paraphrase identification and are more robust against
adversarial or multilingual input, especially if proper normalization is
applied. When used for reference-based evaluation of data-to-text generation in
2 tasks and 17 languages, translation-based measures show a relatively high
correlation to human judgments.
Related papers
- Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation [0.9576327614980397]
This study aims to assess the reliability of automatic metrics in evaluating simultaneous interpretations by analyzing their correlation with human evaluations.
As a benchmark we use human assessments performed by language experts, and evaluate how well sentence embeddings and Large Language Models correlate with them.
The results suggest GPT models, particularly GPT-3.5 with direct prompting, demonstrate the strongest correlation with human judgment in terms of semantic similarity between source and target texts.
arXiv Detail & Related papers (2024-06-14T14:47:19Z) - BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine
Translation [4.651581292181871]
We propose a bidirectional semantic-based evaluation method designed to assess the sense distance of the translation from the source text.
This approach employs the comprehensive multilingual encyclopedic dictionary BabelNet.
Factual analysis shows a strong correlation between the average evaluation scores generated by our method and the human assessments across various machine translation systems for English-German language pair.
arXiv Detail & Related papers (2024-03-06T08:02:21Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - Extrinsic Evaluation of Machine Translation Metrics [78.75776477562087]
It is unclear if automatic metrics are reliable at distinguishing good translations from bad translations at the sentence level.
We evaluate the segment-level performance of the most widely used MT metrics (chrF, COMET, BERTScore, etc.) on three downstream cross-lingual tasks.
Our experiments demonstrate that all metrics exhibit negligible correlation with the extrinsic evaluation of the downstream outcomes.
arXiv Detail & Related papers (2022-12-20T14:39:58Z) - FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation [64.9546787488337]
We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation.
The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese.
arXiv Detail & Related papers (2022-10-01T05:02:04Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Sentiment-based Candidate Selection for NMT [2.580271290008534]
We propose a decoder-side approach that incorporates automatic sentiment scoring into the machine translation (MT) candidate selection process.
We train separate English and Spanish sentiment classifiers, then, using n-best candidates generated by a baseline MT model with beam search, select the candidate that minimizes the absolute difference between the sentiment score of the source sentence and that of the translation.
The results of human evaluations show that, in comparison to the open-source MT model on top of which our pipeline is built, our baseline translations are more accurate of colloquial, sentiment-heavy source texts.
arXiv Detail & Related papers (2021-04-10T19:01:52Z) - A Corpus for English-Japanese Multimodal Neural Machine Translation with
Comparable Sentences [21.43163704217968]
We propose a new multimodal English-Japanese corpus with comparable sentences that are compiled from existing image captioning datasets.
Due to low translation scores in our baseline experiments, we believe that current multimodal NMT models are not designed to effectively utilize comparable sentence data.
arXiv Detail & Related papers (2020-10-17T06:12:25Z) - On the Limitations of Cross-lingual Encoders as Exposed by
Reference-Free Machine Translation Evaluation [55.02832094101173]
Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual similarity.
This paper concerns ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations.
We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.
We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations.
arXiv Detail & Related papers (2020-05-03T22:10:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.