Urdu Dependency Parsing and Treebank Development: A Syntactic and Morphological Perspective
- URL: http://arxiv.org/abs/2406.09549v2
- Date: Wed, 02 Oct 2024 11:44:26 GMT
- Title: Urdu Dependency Parsing and Treebank Development: A Syntactic and Morphological Perspective
- Authors: Nudrat Habib,
- Abstract summary: We use dependency parsing to analyze news articles in Urdu.
We achieve a best-labeled accuracy (LA) of 70% and an unlabeled attachment score (UAS) of 84%.
- Score: 0.0
- License:
- Abstract: Parsing is the process of analyzing a sentence's syntactic structure by breaking it down into its grammatical components. and is critical for various linguistic applications. Urdu is a low-resource, free word-order language and exhibits complex morphology. Literature suggests that dependency parsing is well-suited for such languages. Our approach begins with a basic feature model encompassing word location, head word identification, and dependency relations, followed by a more advanced model integrating part-of-speech (POS) tags and morphological attributes (e.g., suffixes, gender). We manually annotated a corpus of news articles of varying complexity. Using Maltparser and the NivreEager algorithm, we achieved a best-labeled accuracy (LA) of 70% and an unlabeled attachment score (UAS) of 84%, demonstrating the feasibility of dependency parsing for Urdu.
Related papers
- Integrating Supertag Features into Neural Discontinuous Constituent Parsing [0.0]
Traditional views of constituency demand that constituents consist of adjacent words, common in languages like German.
Transition-based parsing produces trees given raw text input using supervised learning on large annotated corpora.
arXiv Detail & Related papers (2024-10-11T12:28:26Z) - CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages [10.441585970299547]
We propose a contrastive self-supervised learning method to make the model robust to word order variations.
Our proposed modification demonstrates a substantial average gain of 3.03/2.95 points in 7 relatively free word order languages.
arXiv Detail & Related papers (2024-10-09T14:38:49Z) - Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years.
We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives.
We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z) - Multilingual Extraction and Categorization of Lexical Collocations with
Graph-aware Transformers [86.64972552583941]
We put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context.
Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.
arXiv Detail & Related papers (2022-05-23T16:47:37Z) - On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar.
We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods.
Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z) - Constrained Language Models Yield Few-Shot Semantic Parsers [73.50960967598654]
We explore the use of large pretrained language models as few-shot semantics.
The goal in semantic parsing is to generate a structured meaning representation given a natural language input.
We use language models to paraphrase inputs into a controlled sublanguage resembling English that can be automatically mapped to a target meaning representation.
arXiv Detail & Related papers (2021-04-18T08:13:06Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - A Survey of Syntactic-Semantic Parsing Based on Constituent and
Dependency Structures [14.714725860010724]
We focus on two of the most popular formalizations of parsing: constituent parsing and dependency parsing.
This article briefly reviews the representative models of constituent parsing and dependency parsing, and also dependency parsing with rich semantics.
arXiv Detail & Related papers (2020-06-19T10:21:17Z) - Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing? [22.93722845643562]
We show that POS tagging can still significantly improve parsing performance when using the Stack joint framework.
Considering that it is much cheaper to annotate POS tags than parse trees, we also investigate the utilization of large-scale heterogeneous POS tag data.
arXiv Detail & Related papers (2020-03-06T13:47:30Z) - A Hybrid Approach to Dependency Parsing: Combining Rules and Morphology
with Deep Learning [0.0]
We propose two approaches to dependency parsing especially for languages with restricted amount of training data.
Our first approach combines a state-of-the-art deep learning-based with a rule-based approach and the second one incorporates morphological information into the network.
The proposed methods are developed for Turkish, but can be adapted to other languages as well.
arXiv Detail & Related papers (2020-02-24T08:34:33Z) - A Simple Joint Model for Improved Contextual Neural Lemmatization [60.802451210656805]
We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages.
Our paper describes the model in addition to training and decoding procedures.
arXiv Detail & Related papers (2019-04-04T02:03:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.