Quantifying Synthesis and Fusion and their Impact on Machine Translation
- URL: http://arxiv.org/abs/2205.03369v1
- Date: Fri, 6 May 2022 17:04:58 GMT
- Title: Quantifying Synthesis and Fusion and their Impact on Machine Translation
- Authors: Arturo Oncevay and Duygu Ataman and Niels van Berkel and Barry Haddow
and Alexandra Birch and Johannes Bjerva
- Abstract summary: In Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative.
In this work, we propose to reduce the rigidity of such claims, by quantifying morphological typology at the word and segment level.
For computing literature, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study.
Then, we analyse the relationship between machine translation quality and the degree of synthesis and fusion at word (nouns and verbs for English-Turkish,
- Score: 79.61874492642691
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Theoretical work in morphological typology offers the possibility of
measuring morphological diversity on a continuous scale. However, literature in
Natural Language Processing (NLP) typically labels a whole language with a
strict type of morphology, e.g. fusional or agglutinative. In this work, we
propose to reduce the rigidity of such claims, by quantifying morphological
typology at the word and segment level. We consider Payne (2017)'s approach to
classify morphology using two indices: synthesis (e.g. analytic to
polysynthetic) and fusion (agglutinative to fusional). For computing synthesis,
we test unsupervised and supervised morphological segmentation methods for
English, German and Turkish, whereas for fusion, we propose a semi-automatic
method using Spanish as a case study. Then, we analyse the relationship between
machine translation quality and the degree of synthesis and fusion at word
(nouns and verbs for English-Turkish, and verbs in English-Spanish) and segment
level (previous language pairs plus English-German in both directions). We
complement the word-level analysis with human evaluation, and overall, we
observe a consistent impact of both indexes on machine translation quality.
Related papers
- Agentivit\`a e telicit\`a in GilBERTo: implicazioni cognitive [77.71680953280436]
The goal of this study is to investigate whether a Transformer-based neural language model infers lexical semantics.
The semantic properties considered are telicity (also combined with definiteness) and agentivity.
arXiv Detail & Related papers (2023-07-06T10:52:22Z) - Modeling Target-Side Morphology in Neural Machine Translation: A
Comparison of Strategies [72.56158036639707]
Morphologically rich languages pose difficulties to machine translation.
A large amount of differently inflected word surface forms entails a larger vocabulary.
Some inflected forms of infrequent terms typically do not appear in the training corpus.
Linguistic agreement requires the system to correctly match the grammatical categories between inflected word forms in the output sentence.
arXiv Detail & Related papers (2022-03-25T10:13:20Z) - Morphology Without Borders: Clause-Level Morphological Annotation [8.559428282730021]
We propose to view morphology as a clause-level phenomenon, rather than word-level.
We deliver a novel dataset for clause-level morphology covering 4 typologically-different languages: English, German, Turkish and Hebrew.
Our experiments show that the clause-level tasks are substantially harder than the respective word-level tasks, while having comparable complexity across languages.
arXiv Detail & Related papers (2022-02-25T17:20:28Z) - Translating from Morphologically Complex Languages: A Paraphrase-Based
Approach [45.900339652085584]
We treat the pairwise relationship between morphologically related words as potential paraphrases and handle using paraphrasing techniques at the word, phrase, and sentence level.
Experiments translating from Malay, whose morphology is mostly derivational, into English show significant improvements over rivaling approaches.
arXiv Detail & Related papers (2021-09-27T07:02:19Z) - How Suitable Are Subword Segmentation Strategies for Translating
Non-Concatenative Morphology? [26.71325671956197]
We design a test suite to evaluate segmentation strategies on different types of morphological phenomena.
We find that learning to analyse and generate morphologically complex surface representations is still challenging.
arXiv Detail & Related papers (2021-09-02T17:23:21Z) - Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures.
We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z) - A Simple Joint Model for Improved Contextual Neural Lemmatization [60.802451210656805]
We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages.
Our paper describes the model in addition to training and decoding procedures.
arXiv Detail & Related papers (2019-04-04T02:03:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.