Related papers: Modern Uyghur Dependency Treebank (MUDT): An Integrated Morphosyntactic Framework for a Low-Resource Language

Modern Uyghur Dependency Treebank (MUDT): An Integrated Morphosyntactic Framework for a Low-Resource Language

URL: http://arxiv.org/abs/2507.21536v1
Date: Tue, 29 Jul 2025 07:02:04 GMT
Title: Modern Uyghur Dependency Treebank (MUDT): An Integrated Morphosyntactic Framework for a Low-Resource Language
Authors: Jiaxin Zuo, Yiquan Wang, Yuan Pan, Xiadiya Yibulayin,
Abstract summary: This study introduces a dependency annotation framework designed to overcome the limitations of existing treebanks.<n>Modern Uyghur Dependency Treebank (MUDT) provides a more accurate and semantically transparent representation.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: To address a critical resource gap in Uyghur Natural Language Processing (NLP), this study introduces a dependency annotation framework designed to overcome the limitations of existing treebanks for the low-resource, agglutinative language. This inventory includes 18 main relations and 26 subtypes, with specific labels such as cop:zero for verbless clauses and instr:case=loc/dat for nuanced instrumental functions. To empirically validate the necessity of this tailored approach, we conducted a cross-standard evaluation using a pre-trained Universal Dependencies parser. The analysis revealed a systematic 47.9% divergence in annotations, pinpointing the inadequacy of universal schemes for handling Uyghur-specific structures. Grounded in nine annotation principles that ensure typological accuracy and semantic transparency, the Modern Uyghur Dependency Treebank (MUDT) provides a more accurate and semantically transparent representation, designed to enable significant improvements in parsing and downstream NLP tasks, and offers a replicable model for other morphologically complex languages.

Related papers

HeQ: a Large and Diverse Hebrew Reading Comprehension Benchmark [54.73504952691398]
We set out to deliver a Hebrew Machine Reading dataset as extractive Questioning.<n>The morphologically rich nature of Hebrew poses a challenge to this endeavor.<n>We devise a novel set of guidelines, a controlled crowdsourcing protocol, and revised evaluation metrics.
arXiv Detail & Related papers (2025-08-03T15:53:01Z)
Pushing the boundary on Natural Language Inference [49.15148871877941]
Natural Language Inference (NLI) is a central task in natural language understanding with applications in fact-checking, question answering and information retrieval.<n>Despite its importance, current NLI systems heavily rely on learning with limiting artifacts and biases, inference and real-world applicability.<n>This work provides a framework for building robust NLI systems without sacrificing quality or real-world applicability.
arXiv Detail & Related papers (2025-04-25T14:20:57Z)
Dependency Parsing with the Structuralized Prompt Template [14.547116901025506]
Dependency parsing is a fundamental task in natural language processing (NLP)<n>We propose a novel dependency parsing method that relies solely on an encoder model with a text-to-text training approach.<n>Our experimental results demonstrate that the proposed method achieves outstanding performance compared to traditional models.
arXiv Detail & Related papers (2025-02-24T07:25:10Z)
Specifying Genericity through Inclusiveness and Abstractness Continuous Scales [1.024113475677323]
This paper introduces a novel annotation framework for the fine-grained modeling of Noun Phrases' (NPs) genericity in natural language. The framework is designed to be simple and intuitive, making it accessible to non-expert annotators and suitable for crowd-sourced tasks.
arXiv Detail & Related papers (2024-03-22T15:21:07Z)
Semantic Role Labeling Meets Definition Modeling: Using Natural Language to Describe Predicate-Argument Structures [104.32063681736349]
We present an approach to describe predicate-argument structures using natural language definitions instead of discrete labels. Our experiments and analyses on PropBank-style and FrameNet-style, dependency-based and span-based SRL also demonstrate that a flexible model with an interpretable output does not necessarily come at the expense of performance.
arXiv Detail & Related papers (2022-12-02T11:19:16Z)
Syntactic Substitutability as Unsupervised Dependency Syntax [31.488677474152794]
We model a more general property implicit in the definition of dependency relations, syntactic substitutability. This property captures the fact that words at either end of a dependency can be substituted with words from the same category. We show that increasing the number of substitutions used improves parsing accuracy on natural data.
arXiv Detail & Related papers (2022-11-29T09:01:37Z)
Incorporating Constituent Syntax for Coreference Resolution [50.71868417008133]
We propose a graph-based method to incorporate constituent syntactic structures. We also explore to utilise higher-order neighbourhood information to encode rich structures in constituent trees. Experiments on the English and Chinese portions of OntoNotes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T07:40:42Z)
Learning compositional structures for semantic graph parsing [81.41592892863979]
We show how AM dependency parsing can be trained directly on a neural latent-variable model. Our model picks up on several linguistic phenomena on its own and achieves comparable accuracy to supervised training.
arXiv Detail & Related papers (2021-06-08T14:20:07Z)
Coordinate Constructions in English Enhanced Universal Dependencies: Analysis and Computational Modeling [1.9950682531209154]
We address the representation of coordinate constructions in Enhanced Universal Dependencies (UD) We create a large-scale dataset of manually edited syntax graphs. We identify several systematic errors in the original data, and propose to also propagate adjuncts.
arXiv Detail & Related papers (2021-03-16T10:24:27Z)
Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.