Developing Universal Dependency Treebanks for Magahi and Braj
- URL: http://arxiv.org/abs/2204.12633v1
- Date: Tue, 26 Apr 2022 23:43:41 GMT
- Title: Developing Universal Dependency Treebanks for Magahi and Braj
- Authors: Mohit Raj, Shyam Ratan, Deepak Alok, Ritesh Kumar, Atul Kr. Ojha
- Abstract summary: In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj.
The Magahi treebank contains 945 sentences and Braj treebank around 500 sentences marked with their lemmas, part-of-speech, morphological features and universal dependencies.
- Score: 0.7349727826230861
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we discuss the development of treebanks for two low-resourced
Indian languages - Magahi and Braj based on the Universal Dependencies
framework. The Magahi treebank contains 945 sentences and Braj treebank around
500 sentences marked with their lemmas, part-of-speech, morphological features
and universal dependencies. This paper gives a description of the different
dependency relationship found in the two languages and give some statistics of
the two treebanks. The dataset will be made publicly available on Universal
Dependency (UD) repository
(https://github.com/UniversalDependencies/UD_Magahi-MGTB/tree/master) in the
next(v2.10) release.
Related papers
- MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank [56.810282574817414]
We present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in Universal Dependencies (UD)
We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies.
Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries.
arXiv Detail & Related papers (2024-03-15T13:33:10Z) - Cross-Lingual Question Answering over Knowledge Base as Reading
Comprehension [61.079852289005025]
Cross-lingual question answering over knowledge base (xKBQA) aims to answer questions in languages different from that of the provided knowledge base.
One of the major challenges facing xKBQA is the high cost of data annotation.
We propose a novel approach for xKBQA in a reading comprehension paradigm.
arXiv Detail & Related papers (2023-02-26T05:52:52Z) - Building an Endangered Language Resource in the Classroom: Universal
Dependencies for Kakataibo [0.8938910048099864]
We launch a new Universal Dependencies treebank for an endangered language from Amazonia: Kakataibo, a Panoan language spoken in Peru.
We first discuss the collaborative methodology implemented, which proved effective to create a treebank in the context of a Computational Linguistic course for undergraduates.
arXiv Detail & Related papers (2022-06-21T12:58:56Z) - UniMorph 4.0: Universal Morphology [104.69846084893298]
This paper presents the expansions and improvements made on several fronts over the last couple of years.
Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages.
In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages.
arXiv Detail & Related papers (2022-05-07T09:19:02Z) - Sememe Prediction for BabelNet Synsets using Multilingual and Multimodal
Information [89.24684041258747]
Sememe knowledge bases (KBs) are built by manually annotating words with sememes.
Existing sememe KBs only cover a few languages, which hinders the wide utilization of sememes.
This paper aims to build a multilingual sememe KB based on BabelNet, a multilingual encyclopedia dictionary.
arXiv Detail & Related papers (2022-03-14T18:37:09Z) - Prix-LM: Pretraining for Multilingual Knowledge Base Construction [59.02868906044296]
We propose a unified framework, Prix-LM, for multilingual knowledge construction and completion.
We leverage two types of knowledge, monolingual triples and cross-lingual links, extracted from existing multilingual KBs.
Experiments on standard entity-related tasks, such as link prediction in multiple languages, cross-lingual entity linking and bilingual lexicon induction, demonstrate its effectiveness.
arXiv Detail & Related papers (2021-10-16T02:08:46Z) - Apurin\~a Universal Dependencies Treebank [0.4893345190925178]
This paper presents and discusses the first Universal Dependencies treebank for the Apurina language.
The treebank contains 76 fully annotated sentences, applies 14 parts-of-speech, as well as seven augmented or new features.
arXiv Detail & Related papers (2021-06-07T07:42:00Z) - Linguistic dependencies and statistical dependence [76.89273585568084]
We use pretrained language models to estimate probabilities of words in context.
We find that maximum-CPMI trees correspond to linguistic dependencies more often than trees extracted from non-contextual PMI estimate.
arXiv Detail & Related papers (2021-04-18T02:43:37Z) - Prague Dependency Treebank -- Consolidated 1.0 [1.7147127043116672]
Prague Dependency Treebank-Consolidated 1.0 (PDT-C 1.0)
PDT-C 1.0 contains four different datasets of Czech, uniformly annotated using the standard PDT scheme.
Altogether, the treebank contains around 180,000 sentences with their morphological, surface and deep syntactic annotation.
arXiv Detail & Related papers (2020-06-05T20:52:55Z) - Universal Dependencies v2: An Evergrowing Multilingual Treebank
Collection [33.86322085911299]
Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages.
We describe version 2 of the guidelines (UD v2), discuss the major changes from UD v1 to UD v2, and give an overview of the currently available treebanks for 90 languages.
arXiv Detail & Related papers (2020-04-22T15:38:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.