A Benchmark Corpus and Neural Approach for Sanskrit Derivative Nouns
Analysis
- URL: http://arxiv.org/abs/2010.12937v1
- Date: Sat, 24 Oct 2020 17:22:44 GMT
- Title: A Benchmark Corpus and Neural Approach for Sanskrit Derivative Nouns
Analysis
- Authors: Arun Kumar Singh, Sushant Dave, Dr. Prathosh A. P., Prof. Brejesh Lall
and Shresth Mehta
- Abstract summary: This paper presents first benchmark corpus of Sanskrit Pratyaya (suffix) and inflectional words (padas) formed due to suffixes.
In this work, we prepared a Sanskrit suffix benchmark corpus called Pratyaya-Kosh to evaluate the performance of tools.
We also present our own neural approach for derivative nouns analysis while evaluating the same on most prominent Sanskrit Morphological Analysis tools.
- Score: 0.755972004983746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents first benchmark corpus of Sanskrit Pratyaya (suffix) and
inflectional words (padas) formed due to suffixes along with neural network
based approaches to process the formation and splitting of inflectional words.
Inflectional words spans the primary and secondary derivative nouns as the
scope of current work. Pratyayas are an important dimension of morphological
analysis of Sanskrit texts. There have been Sanskrit Computational Linguistics
tools for processing and analyzing Sanskrit texts. Unfortunately there has not
been any work to standardize & validate these tools specifically for derivative
nouns analysis. In this work, we prepared a Sanskrit suffix benchmark corpus
called Pratyaya-Kosh to evaluate the performance of tools. We also present our
own neural approach for derivative nouns analysis while evaluating the same on
most prominent Sanskrit Morphological Analysis tools. This benchmark will be
freely dedicated and available to researchers worldwide and we hope it will
motivate all to improve morphological analysis in Sanskrit Language.
Related papers
- Morphology and Syntax of the Tamil Language [0.0]
The paper highlights the complexity and richness of Tamil in terms of its morphological and syntactic features.
It is proven as a rule-based morphological analyser cum generator and a computational grammar for Tamil have already been developed based on this paper.
arXiv Detail & Related papers (2024-01-16T13:52:25Z) - Linguistically-Informed Neural Architectures for Lexical, Syntactic and
Semantic Tasks in Sanskrit [1.184066113335041]
This thesis aims to make Sanskrit manuscripts more accessible to the end-users through natural language technologies.
The morphological richness, compounding, free word orderliness, and low-resource nature of Sanskrit pose significant challenges for developing deep learning solutions.
We identify four fundamental tasks, which are crucial for developing a robust NLP technology for Sanskrit.
arXiv Detail & Related papers (2023-08-17T06:33:33Z) - Sentiment Analysis Using Aligned Word Embeddings for Uralic Languages [1.0312968200748118]
We present an approach for translating word embeddings from a majority language into 4 minority languages.
Furthermore, we present a novel neural network model that is trained on English data to conduct sentiment analysis.
Our research shows that state-of-the-art neural models can be used with endangered languages.
arXiv Detail & Related papers (2023-05-24T17:40:20Z) - An Informational Space Based Semantic Analysis for Scientific Texts [62.997667081978825]
This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts.
The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties.
The research in this paper conducts the base for the geometric representation of the meaning of texts.
arXiv Detail & Related papers (2022-05-31T11:19:32Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar.
We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods.
Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z) - Discrete representations in neural models of spoken language [56.29049879393466]
We compare the merits of four commonly used metrics in the context of weakly supervised models of spoken language.
We find that the different evaluation metrics can give inconsistent results.
arXiv Detail & Related papers (2021-05-12T11:02:02Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Neural disambiguation of lemma and part of speech in morphologically
rich languages [0.6346772579930928]
We consider the problem of disambiguating the lemma and part of speech of ambiguous words in morphologically rich languages.
We propose a method for disambiguating ambiguous words in context, using a large un-annotated corpus of text, and a morphological analyser.
arXiv Detail & Related papers (2020-07-12T21:48:52Z) - Validation and Normalization of DCS corpus using Sanskrit Heritage tools
to build a tagged Gold Corpus [0.0]
The Digital Corpus of Sanskrit records around 650,000 sentences along with their morphological and lexical tagging.
The Sanskrit Heritage Engine's Reader produces all possible segmentations with morphological and lexical analyses.
arXiv Detail & Related papers (2020-05-13T19:23:43Z) - Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures.
We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.