Arabic Offensive Language Detection Using Machine Learning and Ensemble
Machine Learning Approaches
- URL: http://arxiv.org/abs/2005.08946v1
- Date: Sat, 16 May 2020 06:40:36 GMT
- Title: Arabic Offensive Language Detection Using Machine Learning and Ensemble
Machine Learning Approaches
- Authors: Fatemah Husain
- Abstract summary: The study shows significant impact for applying ensemble machine learning approach over the single learner machine learning approach.
Among the trained ensemble machine learning classifiers, bagging performs the best in offensive language detection with F1 score of 88%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study aims at investigating the effect of applying single learner
machine learning approach and ensemble machine learning approach for offensive
language detection on Arabic language. Classifying Arabic social media text is
a very challenging task due to the ambiguity and informality of the written
format of the text. Arabic language has multiple dialects with diverse
vocabularies and structures, which increase the complexity of obtaining high
classification performance. Our study shows significant impact for applying
ensemble machine learning approach over the single learner machine learning
approach. Among the trained ensemble machine learning classifiers, bagging
performs the best in offensive language detection with F1 score of 88%, which
exceeds the score obtained by the best single learner classifier by 6%. Our
findings highlight the great opportunities of investing more efforts in
promoting the ensemble machine learning approach solutions for offensive
language detection models.
Related papers
- Strategies for Arabic Readability Modeling [9.976720880041688]
Automatic readability assessment is relevant to building NLP applications for education, content analysis, and accessibility.
We present a set of experimental results on Arabic readability assessment using a diverse range of approaches.
arXiv Detail & Related papers (2024-07-03T11:54:11Z) - An ensemble-based framework for mispronunciation detection of Arabic
phonemes [0.0]
This work introduces an ensemble model that defines the mispronunciation of Arabic phonemes.
Experiment results demonstrate that the utilization of voting as an ensemble algorithm with Mel spectrogram feature extraction technique exhibits remarkable classification result with 95.9% of accuracy.
arXiv Detail & Related papers (2023-01-03T22:17:08Z) - AI-based Arabic Language and Speech Tutor [1.7616042687330644]
We present our approach for developing an Artificial Intelligence-based Arabic Language and Speech Tutor (AI-ALST)
The AI-ALST system is an intelligent tutor that provides analysis and assessment of students learning the Moroccan dialect at University of Arizona (UA)
The AI-ALST provides a self-learned environment to practice each lesson for pronunciation training.
arXiv Detail & Related papers (2022-10-22T04:22:16Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - Efficient Measuring of Readability to Improve Documents Accessibility
for Arabic Language Learners [0.0]
The approach is based on machine learning classification methods to discriminate between different levels of difficulty in reading and understanding a text.
Several models were trained on a large corpus mined from online Arabic websites and manually annotated.
Best results were achieved using TF-IDF Vectors trained by a combination of word-based unigrams and bigrams with an overall accuracy of 87.14% over four classes of complexity.
arXiv Detail & Related papers (2021-09-09T10:05:38Z) - The Challenges of Persian User-generated Textual Content: A Machine
Learning-Based Approach [0.0]
This research applies machine learning-based approaches to tackle the hurdles that come with Persian user-generated textual content.
The presented approach uses a machine-translated datasets to conduct sentiment analysis for the Persian language.
The results of the experiments have shown promising state-of-the-art performance in contrast to the previous efforts.
arXiv Detail & Related papers (2021-01-20T11:57:59Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.