Morphological Reinflection with Multiple Arguments: An Extended
Annotation schema and a Georgian Case Study
- URL: http://arxiv.org/abs/2203.08527v1
- Date: Wed, 16 Mar 2022 10:47:29 GMT
- Title: Morphological Reinflection with Multiple Arguments: An Extended
Annotation schema and a Georgian Case Study
- Authors: David Guriel, Omer Goldman, Reut Tsarfaty
- Abstract summary: We extend the UniMorph morphological dataset to cover verbs that agree with multiple arguments using true affixes.
The dataset has 4 times more tables and 6 times more verb forms compared to the existing UniMorph dataset.
It is expected to improve the coverage, consistency and interpretability of this benchmark.
- Score: 7.245355976804435
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In recent years, a flurry of morphological datasets had emerged, most notably
UniMorph, a multi-lingual repository of inflection tables. However, the flat
structure of the current morphological annotation schemas makes the treatment
of some languages quirky, if not impossible, specifically in cases of
polypersonal agreement. In this paper we propose a general solution for such
cases and expand the UniMorph annotation schema to naturally address this
phenomenon, in which verbs agree with multiple arguments using true affixes. We
apply this extended schema to one such language, Georgian, and provide a
human-verified, accurate and balanced morphological dataset for Georgian verbs.
The dataset has 4 times more tables and 6 times more verb forms compared to the
existing UniMorph dataset, covering all possible variants of argument marking,
demonstrating the adequacy of our proposed scheme. Experiments with a standard
reinflection model show that generalization is easy when the data is split at
the form level, but extremely hard when splitting along lemma lines. Expanding
the other languages in UniMorph to this schema is expected to improve both the
coverage, consistency and interpretability of this benchmark.
Related papers
- Improving Generalization in Semantic Parsing by Increasing Natural
Language Variation [67.13483734810852]
In this work, we use data augmentation to enhance robustness of text-to- semantic parsing.
We leverage the capabilities of large language models to generate more realistic and diverse questions.
Using only a few prompts, we achieve a two-fold increase in the number of questions in Spider.
arXiv Detail & Related papers (2024-02-13T18:48:23Z) - Morphosyntactic probing of multilingual BERT models [41.83131308999425]
We introduce an extensive dataset for multilingual probing of morphological information in language models.
We find that pre-trained Transformer models (mBERT and XLM-RoBERTa) learn features that attain strong performance across these tasks.
arXiv Detail & Related papers (2023-06-09T19:15:20Z) - mFACE: Multilingual Summarization with Factual Consistency Evaluation [79.60172087719356]
Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets.
Despite promising results, current models still suffer from generating factually inconsistent summaries.
We leverage factual consistency evaluation models to improve multilingual summarization.
arXiv Detail & Related papers (2022-12-20T19:52:41Z) - UniMorph 4.0: Universal Morphology [104.69846084893298]
This paper presents the expansions and improvements made on several fronts over the last couple of years.
Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages.
In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages.
arXiv Detail & Related papers (2022-05-07T09:19:02Z) - Compositional Temporal Grounding with Structured Variational Cross-Graph
Correspondence Learning [92.07643510310766]
Temporal grounding in videos aims to localize one target video segment that semantically corresponds to a given query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We empirically find that they fail to generalize to queries with novel combinations of seen words.
We propose a variational cross-graph reasoning framework that explicitly decomposes video and language into multiple structured hierarchies.
arXiv Detail & Related papers (2022-03-24T12:55:23Z) - Morphology Without Borders: Clause-Level Morphological Annotation [8.559428282730021]
We propose to view morphology as a clause-level phenomenon, rather than word-level.
We deliver a novel dataset for clause-level morphology covering 4 typologically-different languages: English, German, Turkish and Hebrew.
Our experiments show that the clause-level tasks are substantially harder than the respective word-level tasks, while having comparable complexity across languages.
arXiv Detail & Related papers (2022-02-25T17:20:28Z) - Grounded Graph Decoding Improves Compositional Generalization in
Question Answering [68.72605660152101]
Question answering models struggle to generalize to novel compositions of training patterns, such as longer sequences or more complex test structures.
We propose Grounded Graph Decoding, a method to improve compositional generalization of language representations by grounding structured predictions with an attention mechanism.
Our model significantly outperforms state-of-the-art baselines on the Compositional Freebase Questions (CFQ) dataset, a challenging benchmark for compositional generalization in question answering.
arXiv Detail & Related papers (2021-11-05T17:50:14Z) - Minimal Supervision for Morphological Inflection [8.532288965425805]
We bootstrapping labeled data from a seed as little as em five labeled paradigms, accompanied by a large bulk of unlabeled text.
Our approach exploits different kinds of regularities in morphological systems in a two-phased setup.
We experiment with the Paradigm Cell Filling Problem over eight typologically different languages, and find that, in languages with relatively simple morphology, orthographic regularities on their own allow inflection models to achieve respectable accuracy.
arXiv Detail & Related papers (2021-04-17T11:07:36Z) - A Simple Joint Model for Improved Contextual Neural Lemmatization [60.802451210656805]
We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages.
Our paper describes the model in addition to training and decoding procedures.
arXiv Detail & Related papers (2019-04-04T02:03:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.