Exploring Syntactic Patterns in Urdu: A Deep Dive into Dependency Analysis
- URL: http://arxiv.org/abs/2406.09549v1
- Date: Thu, 13 Jun 2024 19:30:32 GMT
- Title: Exploring Syntactic Patterns in Urdu: A Deep Dive into Dependency Analysis
- Authors: Nudrat Habib,
- Abstract summary: The dependency parsing approach is better suited for order-free languages like Urdu.
The dependency tagset is designed after careful consideration of the complex morphological structure of the Urdu language.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parsing is the process of breaking a sentence into its grammatical components and identifying the syntactic structure of the sentence. The syntactically correct sentence structure is achieved by assigning grammatical labels to its constituents using lexicon and syntactic rules. In linguistics, parser is extremely useful due to the number of different applications like name entity recognition, QA systems and information extraction, etc. The two most common techniques used for parsing are phrase structure and dependency Structure. Because Urdu is a low-resource language, there has been little progress in building an Urdu parser. A comparison of several parsers revealed that the dependency parsing approach is better suited for order-free languages such as Urdu. We have made significant progress in parsing Urdu, a South Asian language with a complex morphology. For Urdu dependency parsing, a basic feature model consisting of word location, word head, and dependency relation is employed as a starting point, followed by more complex feature models. The dependency tagset is designed after careful consideration of the complex morphological structure of the Urdu language, word order variation, and lexical ambiguity and it contains 22 tags. Our dataset comprises of sentences from news articles, and we tried to include sentences of different complexity (which is quite challenging), to get reliable results. All experiments are performed using MaltParser, exploring all 9 algorithms and classifiers. We have achieved a 70 percent overall best-labeled accuracy (LA), as well as an 84 percent overall best-unlabeled attachment score (UAS) using the Nivreeager algorithm. The comparison of output data with treebank test data that has been manually parsed is then used to carry out error assessment and to identify the errors produced by the parser.
Related papers
- Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years.
We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives.
We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z) - On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar.
We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods.
Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z) - More Than Words: Collocation Tokenization for Latent Dirichlet
Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ.
We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z) - ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set
of Simple Sentences [7.639576741566091]
We propose a new task to decompose each complex sentence into simple sentences derived from the tensed clauses in the source.
Our neural model learns to Accept, Break, Copy or Drop elements of a graph that combines word adjacency and grammatical dependencies.
We introduce DeSSE, a new dataset designed to train and evaluate complex sentence decomposition.
arXiv Detail & Related papers (2021-06-22T19:31:28Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - A Practical Chinese Dependency Parser Based on A Large-scale Dataset [21.359679124869402]
Dependency parsing is a longstanding natural language processing task, with its outputs crucial to various downstream tasks.
Recently, neural network based (NN-based) dependency has achieved significant progress and obtained the state-of-the-art results.
As we all know, NN-based approaches require massive amounts of labeled training data, which is very expensive because it requires human annotation by experts.
arXiv Detail & Related papers (2020-09-02T08:41:46Z) - Machine learning approach of Japanese composition scoring and writing
aided system's design [0.0]
A composition scoring system can greatly assist language learners.
It can make language leaner improve themselves in the process of output something.
Especially for foreign language learners, lexical and syntactic content are usually what they are more concerned about.
arXiv Detail & Related papers (2020-08-26T11:01:13Z) - How to Probe Sentence Embeddings in Low-Resource Languages: On
Structural Design Choices for Probing Task Evaluation [82.96358326053115]
We investigate sensitivity of probing task results to structural design choices.
We probe embeddings in a multilingual setup with design choices that lie in a'stable region', as we identify for English.
We find that results on English do not transfer to other languages.
arXiv Detail & Related papers (2020-06-16T12:37:50Z) - A Tale of a Probe and a Parser [74.14046092181947]
Measuring what linguistic information is encoded in neural models of language has become popular in NLP.
Researchers approach this enterprise by training "probes" - supervised models designed to extract linguistic structure from another model's output.
One such probe is the structural probe, designed to quantify the extent to which syntactic information is encoded in contextualised word representations.
arXiv Detail & Related papers (2020-05-04T16:57:31Z) - SPARQA: Skeleton-based Semantic Parsing for Complex Questions over
Knowledge Bases [27.343078784035693]
We propose a novel skeleton grammar to represent the high-level structure of a complex question.
This dedicated coarse-grained formalism with a BERT-based parsing algorithm helps to improve the accuracy of the downstream fine-grained semantic parsing.
Our approach shows promising performance on several datasets.
arXiv Detail & Related papers (2020-03-31T05:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.