Exploring Linguistic Probes for Morphological Generalization
- URL: http://arxiv.org/abs/2310.13686v1
- Date: Fri, 20 Oct 2023 17:45:30 GMT
- Title: Exploring Linguistic Probes for Morphological Generalization
- Authors: Jordan Kodner, Salam Khalifa, Sarah Payne
- Abstract summary: Testing these probes on three morphologically distinct languages, we find evidence that three leading morphological inflection systems employ distinct generalization strategies over conjugational classes and feature sets on both orthographic and phonologically transcribed inputs.
- Score: 11.568042812213712
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Modern work on the cross-linguistic computational modeling of morphological
inflection has typically employed language-independent data splitting
algorithms. In this paper, we supplement that approach with language-specific
probes designed to test aspects of morphological generalization. Testing these
probes on three morphologically distinct languages, English, Spanish, and
Swahili, we find evidence that three leading morphological inflection systems
employ distinct generalization strategies over conjugational classes and
feature sets on both orthographic and phonologically transcribed inputs.
Related papers
- Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement [1.4335183427838039]
We take the approach of developing curated synthetic data on a large scale, with specific properties.
We use a new multiple-choice task and datasets, Blackbird Language Matrices, to focus on a specific grammatical structural phenomenon.
We show that despite having been trained on multilingual texts in a consistent manner, multilingual pretrained language models have language-specific differences.
arXiv Detail & Related papers (2024-09-10T14:58:55Z) - Mapping 'when'-clauses in Latin American and Caribbean languages: an experiment in subtoken-based typology [0.0]
This paper explores variation in the expression of generic temporal subordination ('when'-clauses) among the languages of Latin America and the Caribbean.
It presents probabilistic semantic maps computed on the basis of the languages of the region, thus avoiding bias towards the many world's languages that exclusively use lexified connectors.
The approach allows capturing morphological clause-linkage devices in addition to lexified connectors, paving the way for larger-scale, strategy-agnostic analyses of typological variation in temporal subordination.
arXiv Detail & Related papers (2024-04-28T17:43:24Z) - Labeled Morphological Segmentation with Semi-Markov Models [127.69031138022534]
We present labeled morphological segmentation, an alternative view of morphological processing that unifies several tasks.
We additionally introduce a new hierarchy of morphotactic tagsets.
We develop modelname, a discriminative morphological segmentation system that explicitly models morphotactics.
arXiv Detail & Related papers (2024-04-13T12:51:53Z) - Language Embeddings Sometimes Contain Typological Generalizations [0.0]
We train neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1295 languages.
The learned language representations are then compared to existing typological databases as well as to a novel set of quantitative syntactic and morphological features.
We conclude that some generalizations are surprisingly close to traditional features from linguistic typology, but that most models, as well as those of previous work, do not appear to have made linguistically meaningful generalizations.
arXiv Detail & Related papers (2023-01-19T15:09:59Z) - Modeling Target-Side Morphology in Neural Machine Translation: A
Comparison of Strategies [72.56158036639707]
Morphologically rich languages pose difficulties to machine translation.
A large amount of differently inflected word surface forms entails a larger vocabulary.
Some inflected forms of infrequent terms typically do not appear in the training corpus.
Linguistic agreement requires the system to correctly match the grammatical categories between inflected word forms in the output sentence.
arXiv Detail & Related papers (2022-03-25T10:13:20Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - Discrete representations in neural models of spoken language [56.29049879393466]
We compare the merits of four commonly used metrics in the context of weakly supervised models of spoken language.
We find that the different evaluation metrics can give inconsistent results.
arXiv Detail & Related papers (2021-05-12T11:02:02Z) - Constructing a Family Tree of Ten Indo-European Languages with
Delexicalized Cross-linguistic Transfer Patterns [57.86480614673034]
We formalize the delexicalized transfer as interpretable tree-to-string and tree-to-tree patterns.
This allows us to quantitatively probe cross-linguistic transfer and extend inquiries of Second Language Acquisition.
arXiv Detail & Related papers (2020-07-17T15:56:54Z) - Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures.
We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z) - A Systematic Analysis of Morphological Content in BERT Models for
Multiple Languages [2.345305607613153]
This work describes experiments which probe the hidden representations of several BERT-style models for morphological content.
The goal is to examine the extent to which discrete linguistic structure, in the form of morphological features and feature values, presents itself in the vector representations and attention distributions of pre-trained language models for five European languages.
arXiv Detail & Related papers (2020-04-06T22:50:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.