Using CollGram to Compare Formulaic Language in Human and Neural Machine
Translation
- URL: http://arxiv.org/abs/2107.03625v1
- Date: Thu, 8 Jul 2021 06:30:35 GMT
- Title: Using CollGram to Compare Formulaic Language in Human and Neural Machine
Translation
- Authors: Yves Bestgen
- Abstract summary: A comparison of formulaic sequences in human and neural machine translation of quality newspaper articles shows that neural machine translations contain less lower-frequency, but strongly-associated formulaic sequences.
These differences were statistically significant and the effect sizes were almost always medium or large.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A comparison of formulaic sequences in human and neural machine translation
of quality newspaper articles shows that neural machine translations contain
less lower-frequency, but strongly-associated formulaic sequences, and more
high-frequency formulaic sequences. These differences were statistically
significant and the effect sizes were almost always medium or large. These
observations can be related to the differences between second language learners
of various levels and between translated and untranslated texts. The comparison
between the neural machine translation systems indicates that some systems
produce more formulaic sequences of both types than other systems.
Related papers
- Comparing Formulaic Language in Human and Machine Translation: Insight
from a Parliamentary Corpus [0.0]
The text were translated from French to English by three well-known neural machine translation systems: DeepL, Google Translate and Microsoft Translator.
The results confirm the observations on the news corpus, but the differences are less strong.
They suggest that the use of text genres that usually result in more literal translations, such as parliamentary corpora, might be preferable when comparing human and machine translations.
arXiv Detail & Related papers (2022-06-22T08:59:10Z) - How Robust is Neural Machine Translation to Language Imbalance in
Multilingual Tokenizer Training? [86.48323488619629]
We analyze how translation performance changes as the data ratios among languages vary in the tokenizer training corpus.
We find that while relatively better performance is often observed when languages are more equally sampled, the downstream performance is more robust to language imbalance than we usually expected.
arXiv Detail & Related papers (2022-04-29T17:50:36Z) - Modeling Target-Side Morphology in Neural Machine Translation: A
Comparison of Strategies [72.56158036639707]
Morphologically rich languages pose difficulties to machine translation.
A large amount of differently inflected word surface forms entails a larger vocabulary.
Some inflected forms of infrequent terms typically do not appear in the training corpus.
Linguistic agreement requires the system to correctly match the grammatical categories between inflected word forms in the output sentence.
arXiv Detail & Related papers (2022-03-25T10:13:20Z) - Comparing Feature-Engineering and Feature-Learning Approaches for
Multilingual Translationese Classification [11.364204162881482]
We compare the traditional feature-engineering-based approach to the feature-learning-based one.
We investigate how well the hand-crafted features explain the variance in the neural models' predictions.
arXiv Detail & Related papers (2021-09-15T22:34:48Z) - Automatic Classification of Human Translation and Machine Translation: A
Study from the Perspective of Lexical Diversity [1.5229257192293197]
We show that machine translation and human translation can be classified with an accuracy above chance level.
The classification accuracy of machine translation is much higher than human translation.
arXiv Detail & Related papers (2021-05-10T18:55:04Z) - Enriching Non-Autoregressive Transformer with Syntactic and
SemanticStructures for Neural Machine Translation [54.864148836486166]
We propose to incorporate the explicit syntactic and semantic structures of languages into a non-autoregressive Transformer.
Our model achieves a significantly faster speed, as well as keeps the translation quality when compared with several state-of-the-art non-autoregressive models.
arXiv Detail & Related papers (2021-01-22T04:12:17Z) - Neural Representations for Modeling Variation in Speech [9.27189407857061]
We use neural models to compute word-based pronunciation differences between non-native and native speakers of English.
We show that speech representations extracted from a specific type of neural model (i.e. Transformers) lead to a better match with human perception than two earlier approaches.
arXiv Detail & Related papers (2020-11-25T11:19:12Z) - On Long-Tailed Phenomena in Neural Machine Translation [50.65273145888896]
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens.
We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation.
We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy.
arXiv Detail & Related papers (2020-10-10T07:00:57Z) - Robust Neural Machine Translation: Modeling Orthographic and
Interpunctual Variation [3.3194866396158]
We propose a simple generative noise model to generate adversarial examples of ten different types.
We show that, when tested on noisy data, systems trained using adversarial examples perform almost as well as when translating clean data.
arXiv Detail & Related papers (2020-09-11T14:12:54Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.