The Persian Dependency Treebank Made Universal
- URL: http://arxiv.org/abs/2009.10205v2
- Date: Wed, 23 Sep 2020 02:44:07 GMT
- Title: The Persian Dependency Treebank Made Universal
- Authors: Mohammad Sadegh Rasooli, Pegah Safari, Amirsaeid Moloodi, Alireza
Nourian
- Abstract summary: This treebank contains 29107 sentences.
Our data is more compatible with Universal Dependencies than the Persian Universal Dependency Treebank (Seraji et al., 2016)
Our delexicalized Persian-to-English transfer experiments show that a parsing model trained on our data is 2% more accurate than that of Seraji et al.
- Score: 3.4410212782758047
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We describe an automatic method for converting the Persian Dependency
Treebank (Rasooli et al, 2013) to Universal Dependencies. This treebank
contains 29107 sentences. Our experiments along with manual linguistic analysis
show that our data is more compatible with Universal Dependencies than the
Uppsala Persian Universal Dependency Treebank (Seraji et al., 2016), and is
larger in size and more diverse in vocabulary. Our data brings in a labeled
attachment F-score of 85.2 in supervised parsing. Our delexicalized
Persian-to-English parser transfer experiments show that a parsing model
trained on our data is ~2% absolutely more accurate than that of Seraji et al.
(2016) in terms of labeled attachment score.
Related papers
- MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank [56.810282574817414]
We present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in Universal Dependencies (UD)
We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies.
Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries.
arXiv Detail & Related papers (2024-03-15T13:33:10Z) - Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years.
We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives.
We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z) - Universal Dependency Treebank for Odia Language [0.24466725954625887]
This paper presents the first publicly available treebank of Odia, a morphologically rich low resource Indian language.
The treebank contains approx. 1082 tokens (100 sentences) in Odia selected from "Samantar", the largest available parallel corpora collection for Indic languages.
The morphological analysis of the Odia treebank was performed using machine learning techniques.
arXiv Detail & Related papers (2022-05-24T11:19:26Z) - LyS_ACoru\~na at SemEval-2022 Task 10: Repurposing Off-the-Shelf Tools
for Sentiment Analysis as Semantic Dependency Parsing [10.355938901584567]
This paper addresses the problem of structured sentiment analysis using a bi-affine semantic dependency.
For the monolingual setup, we considered: (i) training on a single treebank, and (ii) relaxing the setup by training on treebanks coming from different languages.
For the zero-shot setup and a given target treebank, we relied on: (i) a word-level translation of available treebanks in other languages to get noisy, unlikely-grammatical, but annotated data.
In the post-evaluation phase, we also trained cross-lingual models that simply merged all the English tree
arXiv Detail & Related papers (2022-04-27T10:21:28Z) - Informal Persian Universal Dependency Treebank [19.359203472636835]
This paper presents the phonological, morphological, and syntactic distinctions between formal and informal Persian.
We develop the open-source Informal Persian Universal Dependency Treebank, a new treebank annotated within the Universal Dependency scheme.
arXiv Detail & Related papers (2022-01-10T22:33:07Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - Constructing Taxonomies from Pretrained Language Models [52.53846972667636]
We present a method for constructing taxonomic trees (e.g., WordNet) using pretrained language models.
Our approach is composed of two modules, one that predicts parenthood relations and another that reconciles those predictions into trees.
We train our model on subtrees sampled from WordNet, and test on non-overlapping WordNet subtrees.
arXiv Detail & Related papers (2020-10-24T07:16:21Z) - GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and
Event Extraction [107.8262586956778]
We introduce graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations.
GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree.
We propose to utilize the self-attention mechanism to learn the dependencies between words with different syntactic distances.
arXiv Detail & Related papers (2020-10-06T20:30:35Z) - I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical
Theory [0.0]
This paper is to construct a new Arabic dependency treebank based on the traditional Arabic grammatical theory and the characteristics of the Arabic language.
The proposed Arabic dependency treebank, called I3rab, contrasts with existing Arabic dependency treebanks in two main concepts.
arXiv Detail & Related papers (2020-07-11T13:34:44Z) - Towards Instance-Level Parser Selection for Cross-Lingual Transfer of
Dependency Parsers [59.345145623931636]
We argue for a novel cross-lingual transfer paradigm: instance-level selection (ILPS)
We present a proof-of-concept study focused on instance-level selection in the framework of delexicalized transfer.
arXiv Detail & Related papers (2020-04-16T13:18:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.