On the Influence of Machine Translation on Language Origin Obfuscation
- URL: http://arxiv.org/abs/2106.12830v1
- Date: Thu, 24 Jun 2021 08:33:24 GMT
- Title: On the Influence of Machine Translation on Language Origin Obfuscation
- Authors: Benjamin Murauer, Michael Tschuggnall, G\"unther Specht
- Abstract summary: We analyze the ability to detect the source language from the translated output of two widely used commercial machine translation systems.
Evaluations show that the source language can be reconstructed with high accuracy for documents that contain a sufficient amount of translated text.
- Score: 0.3437656066916039
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In the last decade, machine translation has become a popular means to deal
with multilingual digital content. By providing higher quality translations,
obfuscating the source language of a text becomes more attractive. In this
paper, we analyze the ability to detect the source language from the translated
output of two widely used commercial machine translation systems by utilizing
machine-learning algorithms with basic textual features like n-grams.
Evaluations show that the source language can be reconstructed with high
accuracy for documents that contain a sufficient amount of translated text. In
addition, we analyze how the document size influences the performance of the
prediction, as well as how limiting the set of possible source languages
improves the classification accuracy.
Related papers
- Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features [18.76505158652759]
We propose to exploit both semantic and linguistic features between multiple languages to enhance multilingual translation.
On the encoder side, we introduce a disentangling learning task that aligns encoder representations by disentangling semantic and linguistic features.
On the decoder side, we leverage a linguistic encoder to integrate low-level linguistic features to assist in the target language generation.
arXiv Detail & Related papers (2024-08-02T17:10:12Z) - Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - NSOAMT -- New Search Only Approach to Machine Translation [0.0]
A "new search only approach to machine translation" was adopted to tackle some of the slowness and inaccuracy of the other technologies.
The idea is to develop a solution that, by indexing an incremental set of words that combine a certain semantic meaning, makes it possible to create a process of correspondence between their native language record and the language of translation.
arXiv Detail & Related papers (2023-09-19T11:12:21Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - Exploring Human-Like Translation Strategy with Large Language Models [93.49333173279508]
Large language models (LLMs) have demonstrated impressive capabilities in general scenarios.
This work proposes the MAPS framework, which stands for Multi-Aspect Prompting and Selection.
We employ a selection mechanism based on quality estimation to filter out noisy and unhelpful knowledge.
arXiv Detail & Related papers (2023-05-06T19:03:12Z) - Machine Translation for Accessible Multi-Language Text Analysis [1.5484595752241124]
We show that English-trained measures computed after translation to English have adequate-to-excellent accuracy.
We show this for three major analytics -- sentiment analysis, topic analysis, and word embeddings -- over 16 languages.
arXiv Detail & Related papers (2023-01-20T04:11:38Z) - Informative Language Representation Learning for Massively Multilingual
Neural Machine Translation [47.19129812325682]
In a multilingual neural machine translation model, an artificial language token is usually used to guide translation into the desired target language.
Recent studies show that prepending language tokens sometimes fails to navigate the multilingual neural machine translation models into right translation directions.
We propose two methods, language embedding embodiment and language-aware multi-head attention, to learn informative language representations to channel translation into right directions.
arXiv Detail & Related papers (2022-09-04T04:27:17Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - A Survey of Orthographic Information in Machine Translation [1.2124289787900182]
We show how orthographic information can be used to improve machine translation of under-resourced languages.
We discuss different types of machine translation and demonstrate a recent trend that seeks to link orthographic information with well-established machine translation methods.
arXiv Detail & Related papers (2020-08-04T07:59:02Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.