SIGTYP 2020 Shared Task: Prediction of Typological Features
- URL: http://arxiv.org/abs/2010.08246v2
- Date: Mon, 26 Oct 2020 07:29:45 GMT
- Title: SIGTYP 2020 Shared Task: Prediction of Typological Features
- Authors: Johannes Bjerva and Elizabeth Salesky and Sabrina J. Mielke and Aditi
Chaudhary and Giuseppe G. A. Celano and Edoardo M. Ponti and Ekaterina
Vylomova and Ryan Cotterell and Isabelle Augenstein
- Abstract summary: A major drawback hampering broader adoption of typological KBs is that they are sparsely populated.
As typological features often correlate with one another, it is possible to predict them and thus automatically populate typological KBs.
Overall, the task attracted 8 submissions from 5 teams, out of which the most successful methods make use of such feature correlations.
- Score: 78.95376120154083
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Typological knowledge bases (KBs) such as WALS (Dryer and Haspelmath, 2013)
contain information about linguistic properties of the world's languages. They
have been shown to be useful for downstream applications, including
cross-lingual transfer learning and linguistic probing. A major drawback
hampering broader adoption of typological KBs is that they are sparsely
populated, in the sense that most languages only have annotations for some
features, and skewed, in that few features have wide coverage. As typological
features often correlate with one another, it is possible to predict them and
thus automatically populate typological KBs, which is also the focus of this
shared task. Overall, the task attracted 8 submissions from 5 teams, out of
which the most successful methods make use of such feature correlations.
However, our error analysis reveals that even the strongest submitted systems
struggle with predicting feature values for languages where few features are
known.
Related papers
- data2lang2vec: Data Driven Typological Features Completion [8.28573483085828]
We introduce a multi-lingual Part-of-Speech (POS) tagger, achieving over 70% accuracy across 1,749 languages.
We also introduce a more realistic evaluation setup, focusing on likely to be missing typology features.
arXiv Detail & Related papers (2024-09-25T21:32:57Z) - Language Embeddings Sometimes Contain Typological Generalizations [0.0]
We train neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1295 languages.
The learned language representations are then compared to existing typological databases as well as to a novel set of quantitative syntactic and morphological features.
We conclude that some generalizations are surprisingly close to traditional features from linguistic typology, but that most models, as well as those of previous work, do not appear to have made linguistically meaningful generalizations.
arXiv Detail & Related papers (2023-01-19T15:09:59Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Does Typological Blinding Impede Cross-Lingual Sharing? [31.20201199491578]
We show that a model trained in a cross-lingual setting will pick up on typological cues from the input data.
We investigate how cross-lingual sharing and performance is impacted.
arXiv Detail & Related papers (2021-01-28T09:32:08Z) - NEMO: Frequentist Inference Approach to Constrained Linguistic Typology
Feature Prediction in SIGTYP 2020 Shared Task [83.43738174234053]
We employ frequentist inference to represent correlations between typological features and use this representation to train simple multi-class estimators that predict individual features.
Our best configuration achieved the micro-averaged accuracy score of 0.66 on 149 test languages.
arXiv Detail & Related papers (2020-10-12T19:25:43Z) - Linguistic Typology Features from Text: Inferring the Sparse Features of
World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers.
We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Cross-lingual, Character-Level Neural Morphological Tagging [57.0020906265213]
We train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together.
Learning joint character representations among multiple related languages successfully enables knowledge transfer from the high-resource languages to the low-resource ones, improving accuracy by up to 30% over a monolingual model.
arXiv Detail & Related papers (2017-08-30T08:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.