LowResourceEval-2019: a shared task on morphological analysis for
low-resource languages
- URL: http://arxiv.org/abs/2001.11285v1
- Date: Thu, 30 Jan 2020 12:47:50 GMT
- Title: LowResourceEval-2019: a shared task on morphological analysis for
low-resource languages
- Authors: Elena Klyachko and Alexey Sorokin and Natalia Krizhanovskaya and
Andrew Krizhanovsky and Galina Ryazanskaya
- Abstract summary: The paper describes the results of the first shared task on morphological analysis for the languages of Russia, namely, Evenki, Karelian, Selkup, and Veps.
The tasks include morphological analysis, word form generation and morpheme segmentation.
The article describes the datasets prepared for the shared tasks and contains analysis of the participants' solutions.
- Score: 0.30998852056211795
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The paper describes the results of the first shared task on morphological
analysis for the languages of Russia, namely, Evenki, Karelian, Selkup, and
Veps. For the languages in question, only small-sized corpora are available.
The tasks include morphological analysis, word form generation and morpheme
segmentation. Four teams participated in the shared task. Most of them use
machine-learning approaches, outperforming the existing rule-based ones. The
article describes the datasets prepared for the shared tasks and contains
analysis of the participants' solutions. Language corpora having different
formats were transformed into CONLL-U format. The universal format makes the
datasets comparable to other language corpura and facilitates using them in
other NLP tasks.
Related papers
- To token or not to token: A Comparative Study of Text Representations
for Cross-Lingual Transfer [23.777874316083984]
We propose a scoring Language Quotient metric capable of providing a weighted representation of both zero-shot and few-shot evaluation combined.
Our analysis reveals that image-based models excel in cross-lingual transfer when languages are closely related and share visually similar scripts.
In dependency parsing tasks where word relationships play a crucial role, models with their character-level focus, outperform others.
arXiv Detail & Related papers (2023-10-12T06:59:10Z) - Assessing Linguistic Generalisation in Language Models: A Dataset for
Brazilian Portuguese [4.941630596191806]
We propose a set of intrinsic evaluation tasks that inspect the linguistic information encoded in models developed for Brazilian Portuguese.
These tasks are designed to evaluate how different language models generalise information related to grammatical structures and multiword expressions.
arXiv Detail & Related papers (2023-05-23T13:49:14Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Visual Comparison of Language Model Adaptation [55.92129223662381]
adapters are lightweight alternatives for model adaptation.
In this paper, we discuss several design and alternatives for interactive, comparative visual explanation methods.
We show that, for instance, an adapter trained on the language debiasing task according to context-0 embeddings introduces a new type of bias.
arXiv Detail & Related papers (2022-08-17T09:25:28Z) - Models and Datasets for Cross-Lingual Summarisation [78.56238251185214]
We present a cross-lingual summarisation corpus with long documents in a source language associated with multi-sentence summaries in a target language.
The corpus covers twelve language pairs and directions for four European languages, namely Czech, English, French and German.
We derive cross-lingual document-summary instances from Wikipedia by combining lead paragraphs and articles' bodies from language aligned Wikipedia titles.
arXiv Detail & Related papers (2022-02-19T11:55:40Z) - Multilingual Text Classification for Dravidian Languages [4.264592074410622]
We propose a multilingual text classification framework for the Dravidian languages.
On the one hand, the framework used the LaBSE pre-trained model as the base model.
On the other hand, in view of the problem that the model cannot well recognize and utilize the correlation among languages, we further proposed a language-specific representation module.
arXiv Detail & Related papers (2021-12-03T04:26:49Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - Provenance for Linguistic Corpora Through Nanopublications [0.22940141855172028]
Research in Computational Linguistics is dependent on text corpora for training and testing new tools and methodologies.
While there exists a plethora of annotated linguistic information, these corpora are often not interoperable without significant manual work.
This paper addresses this issue with a case study on event annotated corpora and by creating a new, more interoperable representation of this data in the form of nanopublications.
arXiv Detail & Related papers (2020-06-11T11:30:30Z) - Quda: Natural Language Queries for Visual Data Analytics [33.983060903399554]
We present a new dataset, called Quda, that aims to help V-NLIs recognize analytic tasks from free-form natural language.
Our dataset contains $14,035$ diverse user queries, and each is annotated with one or multiple analytic tasks.
This work is the first attempt to construct a large-scale corpus for recognizing analytic tasks.
arXiv Detail & Related papers (2020-05-07T05:35:16Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - A Simple Joint Model for Improved Contextual Neural Lemmatization [60.802451210656805]
We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages.
Our paper describes the model in addition to training and decoding procedures.
arXiv Detail & Related papers (2019-04-04T02:03:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.