Translating from Morphologically Complex Languages: A Paraphrase-Based
Approach
- URL: http://arxiv.org/abs/2109.13724v1
- Date: Mon, 27 Sep 2021 07:02:19 GMT
- Title: Translating from Morphologically Complex Languages: A Paraphrase-Based
Approach
- Authors: Preslav Nakov, Hwee Tou Ng
- Abstract summary: We treat the pairwise relationship between morphologically related words as potential paraphrases and handle using paraphrasing techniques at the word, phrase, and sentence level.
Experiments translating from Malay, whose morphology is mostly derivational, into English show significant improvements over rivaling approaches.
- Score: 45.900339652085584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel approach to translating from a morphologically complex
language. Unlike previous research, which has targeted word inflections and
concatenations, we focus on the pairwise relationship between morphologically
related words, which we treat as potential paraphrases and handle using
paraphrasing techniques at the word, phrase, and sentence level. An important
advantage of this framework is that it can cope with derivational morphology,
which has so far remained largely beyond the capabilities of statistical
machine translation systems. Our experiments translating from Malay, whose
morphology is mostly derivational, into English show significant improvements
over rivaling approaches based on five automatic evaluation measures (for
320,000 sentence pairs; 9.5 million English word tokens).
Related papers
- Correlation Does Not Imply Compensation: Complexity and Irregularity in the Lexicon [48.00488140516432]
We find evidence of a positive relationship between morphological irregularity and phonotactic complexity within languages.
We also find weak evidence of a negative relationship between word length and morphological irregularity.
arXiv Detail & Related papers (2024-06-07T18:09:21Z) - A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation [8.30255326875704]
Subword regularisation boosts synergy in multilingual modelling, whereas BPE more effectively facilitates transfer during cross-lingual fine-tuning.
Our study confirms that decisions around subword modelling can be key to optimising the benefits of multilingual modelling.
arXiv Detail & Related papers (2024-03-29T13:09:23Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Quantifying Synthesis and Fusion and their Impact on Machine Translation [79.61874492642691]
In Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative.
In this work, we propose to reduce the rigidity of such claims, by quantifying morphological typology at the word and segment level.
For computing literature, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study.
Then, we analyse the relationship between machine translation quality and the degree of synthesis and fusion at word (nouns and verbs for English-Turkish,
arXiv Detail & Related papers (2022-05-06T17:04:58Z) - Modeling Target-Side Morphology in Neural Machine Translation: A
Comparison of Strategies [72.56158036639707]
Morphologically rich languages pose difficulties to machine translation.
A large amount of differently inflected word surface forms entails a larger vocabulary.
Some inflected forms of infrequent terms typically do not appear in the training corpus.
Linguistic agreement requires the system to correctly match the grammatical categories between inflected word forms in the output sentence.
arXiv Detail & Related papers (2022-03-25T10:13:20Z) - Morphologically Aware Word-Level Translation [82.59379608647147]
We propose a novel morphologically aware probability model for bilingual lexicon induction.
Our model exploits the basic linguistic intuition that the lexeme is the key lexical unit of meaning.
arXiv Detail & Related papers (2020-11-15T17:54:49Z) - Morphological Disambiguation from Stemming Data [1.2183405753834562]
Kinyarwanda, a morphologically rich language, currently lacks tools for automated morphological analysis.
We learn to morphologically disambiguate Kinyarwanda verbal forms from a new stemming dataset collected through crowd-sourcing.
Our experiments reveal that inflectional properties of stems and morpheme association rules are the most discriminative features for disambiguation.
arXiv Detail & Related papers (2020-11-11T01:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.