The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm
Completion
- URL: http://arxiv.org/abs/2005.13756v1
- Date: Thu, 28 May 2020 03:09:58 GMT
- Title: The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm
Completion
- Authors: Katharina Kann, Arya McCarthy, Garrett Nicolai, Mans Hulden
- Abstract summary: In this paper, we describe the findings of the SIGMORPHON 2020 shared task on unsupervised morphological paradigm completion.
Participants were asked to submit systems which take raw text and a list of lemmas as input, and output all inflected forms.
We present an analysis here, so that this shared task will ground further research on the topic.
- Score: 28.728844366333185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we describe the findings of the SIGMORPHON 2020 shared task on
unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a
novel task in the field of inflectional morphology. Participants were asked to
submit systems which take raw text and a list of lemmas as input, and output
all inflected forms, i.e., the entire morphological paradigm, of each lemma. In
order to simulate a realistic use case, we first released data for 5
development languages. However, systems were officially evaluated on 9 surprise
languages, which were only revealed a few days before the submission deadline.
We provided a modular baseline system, which is a pipeline of 4 components. 3
teams submitted a total of 7 systems, but, surprisingly, none of the submitted
systems was able to improve over the baseline on average over all 9 test
languages. Only on 3 languages did a submitted system obtain the best results.
This shows that unsupervised morphological paradigm completion is still largely
unsolved. We present an analysis here, so that this shared task will ground
further research on the topic.
Related papers
- SADAS: A Dialogue Assistant System Towards Remediating Norm Violations
in Bilingual Socio-Cultural Conversations [56.31816995795216]
Socially-Aware Dialogue Assistant System (SADAS) is designed to ensure that conversations unfold with respect and understanding.
Our system's novel architecture includes: (1) identifying the categories of norms present in the dialogue, (2) detecting potential norm violations, (3) evaluating the severity of these violations, and (4) implementing targeted remedies to rectify the breaches.
arXiv Detail & Related papers (2024-01-29T08:54:21Z) - Look Before You Leap: A Universal Emergent Decomposition of Retrieval
Tasks in Language Models [58.57279229066477]
We study how language models (LMs) solve retrieval tasks in diverse situations.
We introduce ORION, a collection of structured retrieval tasks spanning six domains.
We find that LMs internally decompose retrieval tasks in a modular way.
arXiv Detail & Related papers (2023-12-13T18:36:43Z) - DOMINO: A Dual-System for Multi-step Visual Language Reasoning [76.69157235928594]
We propose a dual-system for multi-step multimodal reasoning, which consists of a "System-1" step for visual information extraction and a "System-2" step for deliberate reasoning.
Our method with a pre-trained System-2 module performs competitively compared to prior work on in- and out-of-distribution data.
arXiv Detail & Related papers (2023-10-04T13:29:47Z) - The SIGMORPHON 2022 Shared Task on Morpheme Segmentation [39.44280269663147]
The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes.
The best systems outperformed all three state-of-the-art subword tokenization methods (BPE, ULM, Morfessor2) by 30.71% absolute.
To facilitate error analysis and support any type of future studies, we released all system predictions, the evaluation script, and all gold standard datasets.
arXiv Detail & Related papers (2022-06-15T15:57:22Z) - UniMorph 4.0: Universal Morphology [104.69846084893298]
This paper presents the expansions and improvements made on several fronts over the last couple of years.
Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages.
In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages.
arXiv Detail & Related papers (2022-05-07T09:19:02Z) - Morphology Without Borders: Clause-Level Morphological Annotation [8.559428282730021]
We propose to view morphology as a clause-level phenomenon, rather than word-level.
We deliver a novel dataset for clause-level morphology covering 4 typologically-different languages: English, German, Turkish and Hebrew.
Our experiments show that the clause-level tasks are substantially harder than the respective word-level tasks, while having comparable complexity across languages.
arXiv Detail & Related papers (2022-02-25T17:20:28Z) - The NYU-CUBoulder Systems for SIGMORPHON 2020 Task 0 and Task 2 [25.234256237085336]
We describe the NYU-CUBoulder systems for the SIGMORPHON 2020 Task 0 on typologically diverse morphological inflection and Task 2 on unsupervised morphological paradigm completion.
The former consists of generating morphological inflections from a lemma and a set of morphosyntactic features describing the target form.
The latter requires generating entire paradigms for a set of given lemmas from raw text alone.
arXiv Detail & Related papers (2020-06-21T15:41:58Z) - SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological
Inflection [81.85463892070085]
The SIGMORPHON 2020 task on morphological reinflection aims to investigate systems' ability to generalize across typologically distinct languages.
Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages.
arXiv Detail & Related papers (2020-06-20T13:24:14Z) - The IMS-CUBoulder System for the SIGMORPHON 2020 Shared Task on
Unsupervised Morphological Paradigm Completion [27.37360427124081]
We present the systems of the University of Stuttgart IMS and the University of Colorado Boulder for SIGMORPHON 2020 Task 2 on unsupervised morphological paradigm completion.
The task consists of generating the morphological paradigms of a set of lemmas, given only the lemmas themselves and unlabeled text.
Our pointer-generator system obtains the best score of all seven submitted systems on average over all languages, and outperforms the official baseline, which was best overall, on Bulgarian and Kannada.
arXiv Detail & Related papers (2020-05-25T21:23:52Z) - The Paradigm Discovery Problem [121.79963594279893]
We formalize the paradigm discovery problem and develop metrics for judging systems.
We report empirical results on five diverse languages.
Our code and data are available for public use.
arXiv Detail & Related papers (2020-05-04T16:38:54Z) - Unsupervised Morphological Paradigm Completion [26.318483685612765]
Given only raw text and a lemma list, the task consists of generating the morphological paradigms, i.e., all inflected forms, of the lemmas.
We introduce a system for the task, which generates morphological paradigms via the following steps: (i) EDIT TREE retrieval, (ii) additional lemma retrieval, (iii) paradigm size discovery, and (iv) inflection generation.
Our system outperforms trivial baselines with ease and, for some languages, even obtains a higher accuracy than minimally supervised systems.
arXiv Detail & Related papers (2020-05-03T02:56:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.