Quantitative Evaluation of Alternative Translations in a Corpus of
Highly Dissimilar Finnish Paraphrases
- URL: http://arxiv.org/abs/2105.02477v1
- Date: Thu, 6 May 2021 07:22:16 GMT
- Title: Quantitative Evaluation of Alternative Translations in a Corpus of
Highly Dissimilar Finnish Paraphrases
- Authors: Li-Hsin Chang, Sampo Pyysalo, Jenna Kanerva, Filip Ginter
- Abstract summary: We present a quantitative evaluation of differences between alternative translations in a large recently released Finnish paraphrase corpus.
We combine a series of automatic steps detecting systematic variation with manual analysis to reveal regularities and identify categories of translation differences.
- Score: 1.8748036062767652
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we present a quantitative evaluation of differences between
alternative translations in a large recently released Finnish paraphrase corpus
focusing in particular on non-trivial variation in translation. We combine a
series of automatic steps detecting systematic variation with manual analysis
to reveal regularities and identify categories of translation differences. We
find the paraphrase corpus to contain highly non-trivial translation variants
difficult to recognize through automatic approaches.
Related papers
- BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine
Translation [4.651581292181871]
We propose a bidirectional semantic-based evaluation method designed to assess the sense distance of the translation from the source text.
This approach employs the comprehensive multilingual encyclopedic dictionary BabelNet.
Factual analysis shows a strong correlation between the average evaluation scores generated by our method and the human assessments across various machine translation systems for English-German language pair.
arXiv Detail & Related papers (2024-03-06T08:02:21Z) - A Comparative Study of Sentence Embedding Models for Assessing Semantic
Variation [0.0]
We compare several recent sentence embedding methods via time-series of semantic similarity between successive sentences and matrices of pairwise sentence similarity for multiple books of literature.
We find that most of the sentence embedding methods considered do infer highly correlated patterns of semantic similarity in a given document, but show interesting differences.
arXiv Detail & Related papers (2023-08-08T23:31:10Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - HanoiT: Enhancing Context-aware Translation via Selective Context [95.93730812799798]
Context-aware neural machine translation aims to use the document-level context to improve translation quality.
The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context.
We propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context.
arXiv Detail & Related papers (2023-01-17T12:07:13Z) - Exploring Diversity in Back Translation for Low-Resource Machine
Translation [85.03257601325183]
Back translation is one of the most widely used methods for improving the performance of neural machine translation systems.
Recent research has sought to enhance the effectiveness of this method by increasing the 'diversity' of the generated translations.
This work puts forward a more nuanced framework for understanding diversity in training data, splitting it into lexical diversity and syntactic diversity.
arXiv Detail & Related papers (2022-06-01T15:21:16Z) - Multilingual Extraction and Categorization of Lexical Collocations with
Graph-aware Transformers [86.64972552583941]
We put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context.
Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.
arXiv Detail & Related papers (2022-05-23T16:47:37Z) - Decoding and Diversity in Machine Translation [90.33636694717954]
We characterize differences between cost diversity paid for the BLEU scores enjoyed by NMT.
Our study implicates search as a salient source of known bias when translating gender pronouns.
arXiv Detail & Related papers (2020-11-26T21:09:38Z) - Detecting Fine-Grained Cross-Lingual Semantic Divergences without
Supervision by Learning to Rank [28.910206570036593]
This work improves the prediction and annotation of fine-grained semantic divergences.
We introduce a training strategy for multilingual BERT models by learning to rank synthetic divergent examples of varying granularity.
Learning to rank helps detect fine-grained sentence-level divergences more accurately than a strong sentence-level similarity model.
arXiv Detail & Related papers (2020-10-07T21:26:20Z) - Neural disambiguation of lemma and part of speech in morphologically
rich languages [0.6346772579930928]
We consider the problem of disambiguating the lemma and part of speech of ambiguous words in morphologically rich languages.
We propose a method for disambiguating ambiguous words in context, using a large un-annotated corpus of text, and a morphological analyser.
arXiv Detail & Related papers (2020-07-12T21:48:52Z) - Multilingual Alignment of Contextual Word Representations [49.42244463346612]
BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model.
We introduce a contextual version of word retrieval and show that it correlates well with downstream zero-shot transfer.
These results support contextual alignment as a useful concept for understanding large multilingual pre-trained models.
arXiv Detail & Related papers (2020-02-10T03:27:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.