A Survey of Orthographic Information in Machine Translation
- URL: http://arxiv.org/abs/2008.01391v1
- Date: Tue, 4 Aug 2020 07:59:02 GMT
- Title: A Survey of Orthographic Information in Machine Translation
- Authors: Bharathi Raja Chakravarthi, Priya Rani, Mihael Arcan and John P.
McCrae
- Abstract summary: We show how orthographic information can be used to improve machine translation of under-resourced languages.
We discuss different types of machine translation and demonstrate a recent trend that seeks to link orthographic information with well-established machine translation methods.
- Score: 1.2124289787900182
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine translation is one of the applications of natural language processing
which has been explored in different languages. Recently researchers started
paying attention towards machine translation for resource-poor languages and
closely related languages. A widespread and underlying problem for these
machine translation systems is the variation in orthographic conventions which
causes many issues to traditional approaches. Two languages written in two
different orthographies are not easily comparable, but orthographic information
can also be used to improve the machine translation system. This article offers
a survey of research regarding orthography's influence on machine translation
of under-resourced languages. It introduces under-resourced languages in terms
of machine translation and how orthographic information can be utilised to
improve machine translation. We describe previous work in this area, discussing
what underlying assumptions were made, and showing how orthographic knowledge
improves the performance of machine translation of under-resourced languages.
We discuss different types of machine translation and demonstrate a recent
trend that seeks to link orthographic information with well-established machine
translation methods. Considerable attention is given to current efforts of
cognates information at different levels of machine translation and the lessons
that can be drawn from this. Additionally, multilingual neural machine
translation of closely related languages is given a particular focus in this
survey. This article ends with a discussion of the way forward in machine
translation with orthographic information, focusing on multilingual settings
and bilingual lexicon induction.
Related papers
- On the Copying Problem of Unsupervised NMT: A Training Schedule with a
Language Discriminator Loss [120.19360680963152]
unsupervised neural machine translation (UNMT) has achieved success in many language pairs.
The copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs.
We propose a simple but effective training schedule that incorporates a language discriminator loss.
arXiv Detail & Related papers (2023-05-26T18:14:23Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - Informative Language Representation Learning for Massively Multilingual
Neural Machine Translation [47.19129812325682]
In a multilingual neural machine translation model, an artificial language token is usually used to guide translation into the desired target language.
Recent studies show that prepending language tokens sometimes fails to navigate the multilingual neural machine translation models into right translation directions.
We propose two methods, language embedding embodiment and language-aware multi-head attention, to learn informative language representations to channel translation into right directions.
arXiv Detail & Related papers (2022-09-04T04:27:17Z) - On the Influence of Machine Translation on Language Origin Obfuscation [0.3437656066916039]
We analyze the ability to detect the source language from the translated output of two widely used commercial machine translation systems.
Evaluations show that the source language can be reconstructed with high accuracy for documents that contain a sufficient amount of translated text.
arXiv Detail & Related papers (2021-06-24T08:33:24Z) - Extremely low-resource machine translation for closely related languages [0.0]
This work focuses on closely related languages from the Uralic language family: from Estonian and Finnish.
We find that multilingual learning and synthetic corpora increase the translation quality in every language pair.
We show that transfer learning and fine-tuning are very effective for doing low-resource machine translation and achieve the best results.
arXiv Detail & Related papers (2021-05-27T11:27:06Z) - A Framework for Hierarchical Multilingual Machine Translation [3.04585143845864]
This paper presents a hierarchical framework for building multilingual machine translation strategies.
It takes advantage of a typological language family tree for enabling transfer among similar languages.
Exhaustive experimentation on a dataset with 41 languages demonstrates the validity of the proposed framework.
arXiv Detail & Related papers (2020-05-12T01:24:43Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.