Related papers: Developing Universal Dependency Treebanks for Magahi and Braj

Developing Universal Dependency Treebanks for Magahi and Braj

URL: http://arxiv.org/abs/2204.12633v1
Date: Tue, 26 Apr 2022 23:43:41 GMT
Title: Developing Universal Dependency Treebanks for Magahi and Braj
Authors: Mohit Raj, Shyam Ratan, Deepak Alok, Ritesh Kumar, Atul Kr. Ojha
Abstract summary: In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj. The Magahi treebank contains 945 sentences and Braj treebank around 500 sentences marked with their lemmas, part-of-speech, morphological features and universal dependencies.
Score: 0.7349727826230861
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj based on the Universal Dependencies framework. The Magahi treebank contains 945 sentences and Braj treebank around 500 sentences marked with their lemmas, part-of-speech, morphological features and universal dependencies. This paper gives a description of the different dependency relationship found in the two languages and give some statistics of the two treebanks. The dataset will be made publicly available on Universal Dependency (UD) repository (https://github.com/UniversalDependencies/UD_Magahi-MGTB/tree/master) in the next(v2.10) release.

Related papers

The UD-NewsCrawl Treebank: Reflections and Challenges from a Large-scale Tagalog Syntactic Annotation Project [0.0]
This paper presents UD-NewsCrawl, the largest Tagalog treebank to date, containing 15.6k trees manually according to the Universal Dependencies framework.<n>We detail our treebank development process, including data collection, pre-processing, manual annotation, and quality assurance procedures.
arXiv Detail & Related papers (2025-05-26T18:25:10Z)
UD-English-CHILDES: A Collected Resource of Gold and Silver Universal Dependencies Trees for Child Language Interactions [9.218991698992815]
This paper introduces UD-English-CHILDES, the first officially released Universal Dependencies (UD) treebank with consistent and unified annotation guidelines. Our corpus harmonizes annotations from 11 children and their caregivers, totaling over 48k sentences.
arXiv Detail & Related papers (2025-04-28T23:20:36Z)
MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank [56.810282574817414]
We present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in Universal Dependencies (UD) We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies. Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries.
arXiv Detail & Related papers (2024-03-15T13:33:10Z)
Cross-Lingual Question Answering over Knowledge Base as Reading Comprehension [61.079852289005025]
Cross-lingual question answering over knowledge base (xKBQA) aims to answer questions in languages different from that of the provided knowledge base. One of the major challenges facing xKBQA is the high cost of data annotation. We propose a novel approach for xKBQA in a reading comprehension paradigm.
arXiv Detail & Related papers (2023-02-26T05:52:52Z)
Building an Endangered Language Resource in the Classroom: Universal Dependencies for Kakataibo [0.8938910048099864]
We launch a new Universal Dependencies treebank for an endangered language from Amazonia: Kakataibo, a Panoan language spoken in Peru. We first discuss the collaborative methodology implemented, which proved effective to create a treebank in the context of a Computational Linguistic course for undergraduates.
arXiv Detail & Related papers (2022-06-21T12:58:56Z)
UniMorph 4.0: Universal Morphology [104.69846084893298]
This paper presents the expansions and improvements made on several fronts over the last couple of years. Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages.
arXiv Detail & Related papers (2022-05-07T09:19:02Z)
Sememe Prediction for BabelNet Synsets using Multilingual and Multimodal Information [89.24684041258747]
Sememe knowledge bases (KBs) are built by manually annotating words with sememes. Existing sememe KBs only cover a few languages, which hinders the wide utilization of sememes. This paper aims to build a multilingual sememe KB based on BabelNet, a multilingual encyclopedia dictionary.
arXiv Detail & Related papers (2022-03-14T18:37:09Z)
Prix-LM: Pretraining for Multilingual Knowledge Base Construction [59.02868906044296]
We propose a unified framework, Prix-LM, for multilingual knowledge construction and completion. We leverage two types of knowledge, monolingual triples and cross-lingual links, extracted from existing multilingual KBs. Experiments on standard entity-related tasks, such as link prediction in multiple languages, cross-lingual entity linking and bilingual lexicon induction, demonstrate its effectiveness.
arXiv Detail & Related papers (2021-10-16T02:08:46Z)
Apurin\~a Universal Dependencies Treebank [0.4893345190925178]
This paper presents and discusses the first Universal Dependencies treebank for the Apurina language. The treebank contains 76 fully annotated sentences, applies 14 parts-of-speech, as well as seven augmented or new features.
arXiv Detail & Related papers (2021-06-07T07:42:00Z)
Linguistic dependencies and statistical dependence [76.89273585568084]
We use pretrained language models to estimate probabilities of words in context. We find that maximum-CPMI trees correspond to linguistic dependencies more often than trees extracted from non-contextual PMI estimate.
arXiv Detail & Related papers (2021-04-18T02:43:37Z)
Prague Dependency Treebank -- Consolidated 1.0 [1.7147127043116672]
Prague Dependency Treebank-Consolidated 1.0 (PDT-C 1.0) PDT-C 1.0 contains four different datasets of Czech, uniformly annotated using the standard PDT scheme. Altogether, the treebank contains around 180,000 sentences with their morphological, surface and deep syntactic annotation.
arXiv Detail & Related papers (2020-06-05T20:52:55Z)
Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection [33.86322085911299]
Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages. We describe version 2 of the guidelines (UD v2), discuss the major changes from UD v1 to UD v2, and give an overview of the currently available treebanks for 90 languages.
arXiv Detail & Related papers (2020-04-22T15:38:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.