I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical
Theory
- URL: http://arxiv.org/abs/2007.05772v1
- Date: Sat, 11 Jul 2020 13:34:44 GMT
- Title: I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical
Theory
- Authors: Dana Halabi, Ebaa Fayyoumi, Arafat Awajan
- Abstract summary: This paper is to construct a new Arabic dependency treebank based on the traditional Arabic grammatical theory and the characteristics of the Arabic language.
The proposed Arabic dependency treebank, called I3rab, contrasts with existing Arabic dependency treebanks in two main concepts.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Treebanks are valuable linguistic resources that include the syntactic
structure of a language sentence in addition to POS-tags and morphological
features. They are mainly utilized in modeling statistical parsers. Although
the statistical natural language parser has recently become more accurate for
languages such as English, those for the Arabic language still have low
accuracy. The purpose of this paper is to construct a new Arabic dependency
treebank based on the traditional Arabic grammatical theory and the
characteristics of the Arabic language, to investigate their effects on the
accuracy of statistical parsers. The proposed Arabic dependency treebank,
called I3rab, contrasts with existing Arabic dependency treebanks in two main
concepts. The first concept is the approach of determining the main word of the
sentence, and the second concept is the representation of the joined and covert
pronouns. To evaluate I3rab, we compared its performance against a subset of
Prague Arabic Dependency Treebank that shares a comparable level of details.
The conducted experiments show that the percentage improvement reached up to
7.5% in UAS and 18.8% in LAS.
Related papers
- MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank [56.810282574817414]
We present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in Universal Dependencies (UD)
We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies.
Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries.
arXiv Detail & Related papers (2024-03-15T13:33:10Z) - ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic [53.1913348687902]
We present ArabicMMLU, the first multi-task language understanding benchmark for Arabic language.
Our data comprises 40 tasks and 14,575 multiple-choice questions in Modern Standard Arabic (MSA)
Our evaluations of 35 models reveal substantial room for improvement, particularly among the best open-source models.
arXiv Detail & Related papers (2024-02-20T09:07:41Z) - Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years.
We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives.
We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z) - ALDi: Quantifying the Arabic Level of Dialectness of Text [17.37857915257019]
We argue that Arabic speakers perceive a spectrum of dialectness, which we operationalize at the sentence level as the Arabic Level of Dialectness (ALDi)
We provide a detailed analysis of AOC-ALDi and show that a model trained on it can effectively identify levels of dialectness on a range of other corpora.
arXiv Detail & Related papers (2023-10-20T18:07:39Z) - AceGPT, Localizing Large Language Models in Arabic [73.39989503874634]
The paper proposes a comprehensive solution that includes pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic.
The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities.
arXiv Detail & Related papers (2023-09-21T13:20:13Z) - Cross-Lingual NER for Financial Transaction Data in Low-Resource
Languages [70.25418443146435]
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data.
We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information.
With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic.
arXiv Detail & Related papers (2023-07-16T00:45:42Z) - Graphemic Normalization of the Perso-Arabic Script [47.429213930688086]
This paper documents the challenges that Perso-Arabic presents beyond the best-documented languages.
We focus on the situation in natural language processing (NLP), which is affected by multiple, often neglected, issues.
We evaluate the effects of script normalization on eight languages from diverse language families in the Perso-Arabic script diaspora on machine translation and statistical language modeling tasks.
arXiv Detail & Related papers (2022-10-21T21:59:44Z) - Interpreting Arabic Transformer Models [18.98681439078424]
We probe how linguistic information is encoded in Arabic pretrained models, trained on different varieties of Arabic language.
We perform a layer and neuron analysis on the models using three intrinsic tasks: two morphological tagging tasks based on MSA (modern standard Arabic) and dialectal POS-tagging and a dialectal identification task.
arXiv Detail & Related papers (2022-01-19T06:32:25Z) - Sentiment Analysis in Poems in Misurata Sub-dialect -- A Sentiment
Detection in an Arabic Sub-dialect [0.0]
This study focuses on detecting sentiment in poems written in Misurata Arabic sub-dialect spoken in Libya.
The tools used to detect sentiment from the dataset are Sklearn as well as Mazajak sentiment tool 1.
arXiv Detail & Related papers (2021-09-15T10:42:39Z) - Towards One Model to Rule All: Multilingual Strategy for Dialectal
Code-Switching Arabic ASR [11.363966269198064]
We design a large multilingual end-to-end ASR using self-attention based conformer architecture.
We trained the system using Arabic (Ar), English (En) and French (Fr) languages.
Our findings demonstrate the strength of such a model by outperforming state-of-the-art monolingual dialectal Arabic and code-switching Arabic ASR.
arXiv Detail & Related papers (2021-05-31T08:20:38Z) - Effect of Word Embedding Variable Parameters on Arabic Sentiment
Analysis Performance [0.0]
Social media such as Twitter, Facebook, etc. has led to a generated growing number of comments that contains users opinions.
This study will discuss three parameters (Window size, Dimension of vector and Negative Sample) for Arabic sentiment analysis.
Four binary classifiers (Logistic Regression, Decision Tree, Support Vector Machine and Naive Bayes) are used to detect sentiment.
arXiv Detail & Related papers (2021-01-08T08:31:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.