A Survey on Arabic Named Entity Recognition: Past, Recent Advances, and
Future Trends
- URL: http://arxiv.org/abs/2302.03512v3
- Date: Tue, 8 Aug 2023 13:27:29 GMT
- Title: A Survey on Arabic Named Entity Recognition: Past, Recent Advances, and
Future Trends
- Authors: Xiaoye Qu, Yingjie Gu, Qingrong Xia, Zechang Li, Zhefeng Wang, Baoxing
Huai
- Abstract summary: We provide a comprehensive review of the development of Arabic NER.
Traditional Arabic NER systems focus on feature engineering and designing domain-specific rules.
With the growth of pre-trained language model, Arabic NER yields better performance.
- Score: 15.302538985992518
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As more and more Arabic texts emerged on the Internet, extracting important
information from these Arabic texts is especially useful. As a fundamental
technology, Named entity recognition (NER) serves as the core component in
information extraction technology, while also playing a critical role in many
other Natural Language Processing (NLP) systems, such as question answering and
knowledge graph building. In this paper, we provide a comprehensive review of
the development of Arabic NER, especially the recent advances in deep learning
and pre-trained language model. Specifically, we first introduce the background
of Arabic NER, including the characteristics of Arabic and existing resources
for Arabic NER. Then, we systematically review the development of Arabic NER
methods. Traditional Arabic NER systems focus on feature engineering and
designing domain-specific rules. In recent years, deep learning methods achieve
significant progress by representing texts via continuous vector
representations. With the growth of pre-trained language model, Arabic NER
yields better performance. Finally, we conclude the method gap between Arabic
NER and NER methods from other languages, which helps outline future directions
for Arabic NER.
Related papers
- Gazelle: An Instruction Dataset for Arabic Writing Assistance [12.798604366250261]
We present Gazelle, a comprehensive dataset for Arabic writing assistance.
We also offer an evaluation framework designed to enhance Arabic writing assistance tools.
Our findings underscore the need for continuous model training and dataset enrichment.
arXiv Detail & Related papers (2024-10-23T17:51:58Z) - Computational Approaches to Arabic-English Code-Switching [0.0]
We propose and apply state-of-the-art techniques for Modern Standard Arabic and Arabic-English NER tasks.
We have created the first annotated CS Arabic-English corpus for the NER task.
All methods showed improvements in the performance of the NER taggers on CS data.
arXiv Detail & Related papers (2024-10-17T08:20:29Z) - Transformer Models in Education: Summarizing Science Textbooks with AraBART, MT5, AraT5, and mBART [4.214194481944042]
We have developed an advanced text summarization system targeting Arabic textbooks.
This system evaluates and extracts the most important sentences found in biology textbooks for the 11th and 12th grades in the Palestinian curriculum.
arXiv Detail & Related papers (2024-06-11T20:14:09Z) - Arabic Text Sentiment Analysis: Reinforcing Human-Performed Surveys with
Wider Topic Analysis [49.1574468325115]
The in-depth study manually analyses 133 ASA papers published in the English language between 2002 and 2020.
The main findings show the different approaches used for ASA: machine learning, lexicon-based and hybrid approaches.
There is a need to develop ASA tools that can be used in industry, as well as in academia, for Arabic text SA.
arXiv Detail & Related papers (2024-03-04T10:37:48Z) - On the importance of Data Scale in Pretraining Arabic Language Models [46.431706010614334]
We conduct a comprehensive study on the role of data in Arabic Pretrained Language Models (PLMs)
We reassess the performance of a suite of state-of-the-art Arabic PLMs by retraining them on massive-scale, high-quality Arabic corpora.
Our analysis strongly suggests that pretraining data by far is the primary contributor to performance, surpassing other factors.
arXiv Detail & Related papers (2024-01-15T15:11:15Z) - Arabic Sentiment Analysis with Noisy Deep Explainable Model [48.22321420680046]
This paper proposes an explainable sentiment classification framework for the Arabic language.
The proposed framework can explain specific predictions by training a local surrogate explainable model.
We carried out experiments on public benchmark Arabic SA datasets.
arXiv Detail & Related papers (2023-09-24T19:26:53Z) - AceGPT, Localizing Large Language Models in Arabic [73.39989503874634]
The paper proposes a comprehensive solution that includes pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic.
The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities.
arXiv Detail & Related papers (2023-09-21T13:20:13Z) - Improving Natural Language Inference in Arabic using Transformer Models
and Linguistically Informed Pre-Training [0.34998703934432673]
This paper addresses the classification of Arabic text data in the field of Natural Language Processing (NLP)
To overcome this limitation, we create a dedicated data set from publicly available resources.
We find that a language-specific model (AraBERT) performs competitively with state-of-the-art multilingual approaches.
arXiv Detail & Related papers (2023-07-27T07:40:11Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - TArC: Incrementally and Semi-Automatically Collecting a Tunisian Arabish
Corpus [3.8580784887142774]
This article describes the constitution process of the first morpho-syntactically annotated Tunisian Arabish Corpus (TArC)
Arabish, also known as Arabizi, is a spontaneous coding of Arabic dialects in Latin characters and arithmographs (numbers used as letters)
arXiv Detail & Related papers (2020-03-20T22:29:42Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.