A Panoramic Survey of Natural Language Processing in the Arab World
- URL: http://arxiv.org/abs/2011.12631v3
- Date: Mon, 27 Sep 2021 07:13:59 GMT
- Title: A Panoramic Survey of Natural Language Processing in the Arab World
- Authors: Kareem Darwish and Nizar Habash and Mourad Abbas and Hend Al-Khalifa
and Huseein T. Al-Natsheh and Samhaa R. El-Beltagy and Houda Bouamor and
Karim Bouzoubaa and Violetta Cavalli-Sforza and Wassim El-Hajj and Mustafa
Jarrar and Hamdy Mubarak
- Abstract summary: The term natural language refers to any system of symbolic communication (spoken, signed or written) without intentional human planning and design.
Natural language processing (NLP) is the sub-field of artificial intelligence (AI) focused on modeling natural languages to build applications such as speech recognition and synthesis, machine translation, optical character recognition (OCR), sentiment analysis (SA), question answering, dialogue systems, etc.
- Score: 12.064637486695485
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The term natural language refers to any system of symbolic communication
(spoken, signed or written) without intentional human planning and design. This
distinguishes natural languages such as Arabic and Japanese from artificially
constructed languages such as Esperanto or Python. Natural language processing
(NLP) is the sub-field of artificial intelligence (AI) focused on modeling
natural languages to build applications such as speech recognition and
synthesis, machine translation, optical character recognition (OCR), sentiment
analysis (SA), question answering, dialogue systems, etc. NLP is a highly
interdisciplinary field with connections to computer science, linguistics,
cognitive science, psychology, mathematics and others. Some of the earliest AI
applications were in NLP (e.g., machine translation); and the last decade
(2010-2020) in particular has witnessed an incredible increase in quality,
matched with a rise in public awareness, use, and expectations of what may have
seemed like science fiction in the past. NLP researchers pride themselves on
developing language independent models and tools that can be applied to all
human languages, e.g. machine translation systems can be built for a variety of
languages using the same basic mechanisms and models. However, the reality is
that some languages do get more attention (e.g., English and Chinese) than
others (e.g., Hindi and Swahili). Arabic, the primary language of the Arab
world and the religious language of millions of non-Arab Muslims is somewhere
in the middle of this continuum. Though Arabic NLP has many challenges, it has
seen many successes and developments. Next we discuss Arabic's main challenges
as a necessary background, and we present a brief history of Arabic NLP. We
then survey a number of its research areas, and close with a critical
discussion of the future of Arabic NLP.
Related papers
- Computational Approaches to Arabic-English Code-Switching [0.0]
We propose and apply state-of-the-art techniques for Modern Standard Arabic and Arabic-English NER tasks.
We have created the first annotated CS Arabic-English corpus for the NER task.
All methods showed improvements in the performance of the NER taggers on CS data.
arXiv Detail & Related papers (2024-10-17T08:20:29Z) - Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences [31.62071644137294]
We discuss the decreasing diversity of languages in the world and how working with Indigenous languages poses unique ethical challenges for AI and NLP.
We report encouraging results in the development of high-quality machine learning translators for Indigenous languages.
We present prototypes we have built in projects done in 2023 and 2024 with Indigenous communities in Brazil, aimed at facilitating writing.
arXiv Detail & Related papers (2024-07-17T14:46:37Z) - Enhancing Language Learning through Technology: Introducing a New English-Azerbaijani (Arabic Script) Parallel Corpus [0.9051256541674136]
This paper introduces a pioneering English-Azerbaijani (Arabic Script) parallel corpus.
It is designed to bridge the technological gap in language learning and machine translation for under-resourced languages.
arXiv Detail & Related papers (2024-07-06T21:23:20Z) - What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German.
We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Neural Machine Translation for the Indigenous Languages of the Americas:
An Introduction [102.13536517783837]
Most languages from the Americas are among them, having a limited amount of parallel and monolingual data, if any.
We discuss the recent advances and findings and open questions, product of an increased interest of the NLP community in these languages.
arXiv Detail & Related papers (2023-06-11T23:27:47Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Including Signed Languages in Natural Language Processing [48.62744923724317]
Signed languages are the primary means of communication for many deaf and hard of hearing individuals.
This position paper calls on the NLP community to include signed languages as a research area with high social and scientific impact.
arXiv Detail & Related papers (2021-05-11T17:37:55Z) - OkwuGb\'e: End-to-End Speech Recognition for Fon and Igbo [0.015863809575305417]
We present a state-of-art ASR model for Fon, as well as benchmark ASR model results for Igbo.
We conduct a comprehensive linguistic analysis of each language and describe the creation of end-to-end, deep neural network-based speech recognition models for both languages.
arXiv Detail & Related papers (2021-03-13T18:02:44Z) - SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological
Inflection [81.85463892070085]
The SIGMORPHON 2020 task on morphological reinflection aims to investigate systems' ability to generalize across typologically distinct languages.
Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages.
arXiv Detail & Related papers (2020-06-20T13:24:14Z) - Can Multilingual Language Models Transfer to an Unseen Dialect? A Case
Study on North African Arabizi [2.76240219662896]
We study the ability of multilingual language models to process an unseen dialect.
We take user generated North-African Arabic as our case study.
We show in zero-shot and unsupervised adaptation scenarios that multilingual language models are able to transfer to such an unseen dialect.
arXiv Detail & Related papers (2020-05-01T11:29:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.