Automatic Discrimination of Human and Neural Machine Translation in
Multilingual Scenarios
- URL: http://arxiv.org/abs/2305.19757v1
- Date: Wed, 31 May 2023 11:41:24 GMT
- Title: Automatic Discrimination of Human and Neural Machine Translation in
Multilingual Scenarios
- Authors: Malina Chichirau, Rik van Noord and Antonio Toral
- Abstract summary: We tackle the task of automatically discriminating between human and machine translations.
We perform experiments in a multilingual setting, considering multiple languages and multilingual pretrained language models.
- Score: 4.631167282648452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We tackle the task of automatically discriminating between human and machine
translations. As opposed to most previous work, we perform experiments in a
multilingual setting, considering multiple languages and multilingual
pretrained language models. We show that a classifier trained on parallel data
with a single source language (in our case German-English) can still perform
well on English translations that come from different source languages, even
when the machine translations were produced by other systems than the one it
was trained on. Additionally, we demonstrate that incorporating the source text
in the input of a multilingual classifier improves (i) its accuracy and (ii)
its robustness on cross-system evaluation, compared to a monolingual
classifier. Furthermore, we find that using training data from multiple source
languages (German, Russian, and Chinese) tends to improve the accuracy of both
monolingual and multilingual classifiers. Finally, we show that bilingual
classifiers and classifiers trained on multiple source languages benefit from
being trained on longer text sequences, rather than on sentences.
Related papers
- T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - Revisiting Machine Translation for Cross-lingual Classification [91.43729067874503]
Most research in the area focuses on the multilingual models rather than the Machine Translation component.
We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed.
arXiv Detail & Related papers (2023-05-23T16:56:10Z) - Language-Family Adapters for Low-Resource Multilingual Neural Machine
Translation [129.99918589405675]
Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks.
Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive.
We propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer.
arXiv Detail & Related papers (2022-09-30T05:02:42Z) - Informative Language Representation Learning for Massively Multilingual
Neural Machine Translation [47.19129812325682]
In a multilingual neural machine translation model, an artificial language token is usually used to guide translation into the desired target language.
Recent studies show that prepending language tokens sometimes fails to navigate the multilingual neural machine translation models into right translation directions.
We propose two methods, language embedding embodiment and language-aware multi-head attention, to learn informative language representations to channel translation into right directions.
arXiv Detail & Related papers (2022-09-04T04:27:17Z) - Breaking Down Multilingual Machine Translation [74.24795388967907]
We show that multilingual training is beneficial to encoders in general, while it only benefits decoders for low-resource languages (LRLs)
Our many-to-one models for high-resource languages and one-to-many models for LRLs outperform the best results reported by Aharoni et al.
arXiv Detail & Related papers (2021-10-15T14:57:12Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.