Related papers: The Arabic Ontology -- An Arabic Wordnet with Ontologically Clean Content

The Arabic Ontology -- An Arabic Wordnet with Ontologically Clean Content

URL: http://arxiv.org/abs/2205.09664v1
Date: Thu, 19 May 2022 16:27:44 GMT
Title: The Arabic Ontology -- An Arabic Wordnet with Ontologically Clean Content
Authors: Mustafa Jarrar
Abstract summary: Ontology consists of about 1,300 well-investigated concepts in addition to 11,000 concepts that are partially validated. Ontology is accessible and searchable through a lexicographic search engine. Ontology is fully mapped with Princeton WordNet, Wikidata, and other resources.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a formal Arabic wordnet built on the basis of a carefully designed ontology hereby referred to as the Arabic Ontology. The ontology provides a formal representation of the concepts that the Arabic terms convey, and its content was built with ontological analysis in mind, and benchmarked to scientific advances and rigorous knowledge sources as much as this is possible, rather than to only speakers' beliefs as lexicons typically are. A comprehensive evaluation was conducted thereby demonstrating that the current version of the top-levels of the ontology can top the majority of the Arabic meanings. The ontology consists currently of about 1,300 well-investigated concepts in addition to 11,000 concepts that are partially validated. The ontology is accessible and searchable through a lexicographic search engine (https://ontology.birzeit.edu) that also includes about 150 Arabic-multilingual lexicons, and which are being mapped and enriched using the ontology. The ontology is fully mapped with Princeton WordNet, Wikidata, and other resources.

Related papers

WikiTermBase: An AI-Augmented Term Base to Standardize Arabic Translation on Wikipedia [0.0]
This abstract introduces an open source tool, WikiTermBase, with a systematic approach for building a lexicographical database with over 900K terms.<n>The tool was successfully implemented on Arabic Wikipedia to standardize translated English and French terms.
arXiv Detail & Related papers (2025-05-26T11:27:01Z)
Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin? [0.4751886527142778]
Arabizi is a hybrid form of Arabic that incorporates Latin characters and numbers. It poses significant challenges for machine translation due to its lack of formal structure. This research project investigates the model's performance in translating Arabizi into both Modern Standard Arabic and English.
arXiv Detail & Related papers (2025-02-28T11:37:52Z)
Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion [55.27025066199226]
This paper addresses the need for democratizing large language models (LLM) in the Arab world. One practical objective for an Arabic LLM is to utilize an Arabic-specific vocabulary for the tokenizer that could speed up decoding. Inspired by the vocabulary learning during Second Language (Arabic) Acquisition for humans, the released AraLLaMA employs progressive vocabulary expansion.
arXiv Detail & Related papers (2024-12-16T19:29:06Z)
Arabic Text Sentiment Analysis: Reinforcing Human-Performed Surveys with Wider Topic Analysis [49.1574468325115]
The in-depth study manually analyses 133 ASA papers published in the English language between 2002 and 2020. The main findings show the different approaches used for ASA: machine learning, lexicon-based and hybrid approaches. There is a need to develop ASA tools that can be used in industry, as well as in academia, for Arabic text SA.
arXiv Detail & Related papers (2024-03-04T10:37:48Z)
Noor-Ghateh: A Benchmark Dataset for Evaluating Arabic Word Segmenters in Hadith Domain [5.916745177895035]
In this paper, we present a standard dataset for analyzing the Arabic segmentation tools, which includes approximately 223,690 words from the "Shariat al-Islam" book. To estimate the dataset, we applied different methods, including Farasa, Camel, and ALP, and reported the annotation quality and analyzed the benchmark specifications as well.
arXiv Detail & Related papers (2023-06-22T16:50:40Z)
Curras + Baladi: Towards a Levantine Corpus [0.0]
We present the Lebanese Corpus Baladi that consists of around 9.6K morphologically annotated tokens. Our proposed corpus was constructed to be used to enrich Curras and transform it into a more general Levantine corpus.
arXiv Detail & Related papers (2022-05-19T16:53:04Z)
Urdu Morphology, Orthography and Lexicon Extraction [0.0]
This paper describes an implementation of the Urdu language as a software API. We deal with orthography, morphology and the extraction of the lexicon.
arXiv Detail & Related papers (2022-04-06T20:14:01Z)
Semantic Search for Large Scale Clinical Ontologies [63.71950996116403]
We present a deep learning approach to build a search system for large clinical vocabularies. We propose a Triplet-BERT model and a method that generates training data based on semantic training data. The model is evaluated using five real benchmark data sets and the results show that our approach achieves high results on both free text to concept and concept to searching concept vocabularies.
arXiv Detail & Related papers (2022-01-01T05:15:42Z)
New Arabic Medical Dataset for Diseases Classification [55.41644538483948]
We introduce a new Arab medical dataset, which includes two thousand medical documents collected from several Arabic medical websites. The dataset was built for the task of classifying texts and includes 10 classes (Blood, Bone, Cardiovascular, Ear, Endocrine, Eye, Gastrointestinal, Immune, Liver and Nephrological) Experiments on the dataset were performed by fine-tuning three pre-trained models: BERT from Google, Arabert that based on BERT with large Arabic corpus, and AraBioNER that based on Arabert with Arabic medical corpus.
arXiv Detail & Related papers (2021-06-29T10:42:53Z)
Formalising Concepts as Grounded Abstractions [68.24080871981869]
This report shows how representation learning can be used to induce concepts from raw data. The main technical goal of this report is to show how techniques from representation learning can be married with a lattice-theoretic formulation of conceptual spaces.
arXiv Detail & Related papers (2021-01-13T15:22:01Z)
Neural Coreference Resolution for Arabic [12.986359659930146]
We introduce a coreference resolution system for Arabic based on Lee et al's end to end architecture combined with the Arabic version of bert and an external mention detector. As far as we know, this is the first neural coreference resolution system aimed specifically to Arabic. It substantially outperforms the existing state of the art on OntoNotes 5.0 with a gain of 15.2 points conll F1.
arXiv Detail & Related papers (2020-10-31T14:34:43Z)
Automatic Arabic Dialect Identification Systems for Written Texts: A Survey [0.0]
Arabic dialect identification is a specific task of natural language processing, aiming to automatically predict the Arabic dialect of a given text. In this paper, we present a comprehensive survey of Arabic dialect identification research in written texts. We review the traditional machine learning methods, deep learning architectures, and complex learning approaches to Arabic dialect identification.
arXiv Detail & Related papers (2020-09-26T15:33:16Z)
Quran Intelligent Ontology Construction Approach Using Association Rules Mining [0.0]
This research project is concerned with the use of association rules to extract the Quran ontology. Our system is based on the combination of statistics and methods to extract semantic and conceptual relations from Quran verses. The Quran concepts will offer a new and powerful representation of Quran knowledge, and the association rules will help to represent the relations between all classes of connected concepts in the Quran.
arXiv Detail & Related papers (2020-08-07T15:48:58Z)
Computational linguistic assessment of textbook and online learning media by means of threshold concepts in business education [59.003956312175795]
From a linguistic perspective, threshold concepts are instances of specialized vocabularies, exhibiting particular linguistic features. The profiles of 63 threshold concepts from business education have been investigated in textbooks, newspapers, and Wikipedia. The three kinds of resources can indeed be distinguished in terms of their threshold concepts' profiles.
arXiv Detail & Related papers (2020-08-05T12:56:16Z)
Word Sense Disambiguation for 158 Languages using Word Embeddings Only [80.79437083582643]
Disambiguation of word senses in context is easy for humans, but a major challenge for automatic approaches. We present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory. We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings.
arXiv Detail & Related papers (2020-03-14T14:50:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.