Sexism detection: The first corpus in Algerian dialect with a
code-switching in Arabic/ French and English
- URL: http://arxiv.org/abs/2104.01443v1
- Date: Sat, 3 Apr 2021 16:34:51 GMT
- Title: Sexism detection: The first corpus in Algerian dialect with a
code-switching in Arabic/ French and English
- Authors: Imane Guellil and Ahsan Adeel and Faical Azouaou and Mohamed Boubred
and Yousra Houichi and Akram Abdelhaq Moumna
- Abstract summary: A new hate speech corpus (Arabic_fr_en) is developed using three different annotators.
For corpus validation, three different machine learning algorithms are used, including deep Convolutional Neural Network (CNN), long short-term memory (LSTM) network and Bi-directional LSTM (Bi-LSTM) network.
Simulation results demonstrate the best performance of the CNN model, which achieved F1-score up to 86% for the unbalanced corpus.
- Score: 0.3425341633647625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, an approach for hate speech detection against women in Arabic
community on social media (e.g. Youtube) is proposed. In the literature,
similar works have been presented for other languages such as English. However,
to the best of our knowledge, not much work has been conducted in the Arabic
language. A new hate speech corpus (Arabic\_fr\_en) is developed using three
different annotators. For corpus validation, three different machine learning
algorithms are used, including deep Convolutional Neural Network (CNN), long
short-term memory (LSTM) network and Bi-directional LSTM (Bi-LSTM) network.
Simulation results demonstrate the best performance of the CNN model, which
achieved F1-score up to 86\% for the unbalanced corpus as compared to LSTM and
Bi-LSTM.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - A Unified Multi-Task Learning Architecture for Hate Detection Leveraging User-Based Information [23.017068553977982]
Hate speech, offensive language, aggression, racism, sexism, and other abusive language are common phenomena in social media.
There is a need for Artificial Intelligence(AI)based intervention which can filter hate content at scale.
This paper introduces a unique model that improves hate speech identification for the English language by utilising intra-user and inter-user-based information.
arXiv Detail & Related papers (2024-11-11T10:37:11Z) - Arabic Sentiment Analysis with Noisy Deep Explainable Model [48.22321420680046]
This paper proposes an explainable sentiment classification framework for the Arabic language.
The proposed framework can explain specific predictions by training a local surrogate explainable model.
We carried out experiments on public benchmark Arabic SA datasets.
arXiv Detail & Related papers (2023-09-24T19:26:53Z) - Interpreting Arabic Transformer Models [18.98681439078424]
We probe how linguistic information is encoded in Arabic pretrained models, trained on different varieties of Arabic language.
We perform a layer and neuron analysis on the models using three intrinsic tasks: two morphological tagging tasks based on MSA (modern standard Arabic) and dialectal POS-tagging and a dialectal identification task.
arXiv Detail & Related papers (2022-01-19T06:32:25Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Effect of Word Embedding Models on Hate and Offensive Speech Detection [1.7403133838762446]
We investigate the impact of both word embedding models and neural network architectures on the predictive accuracy.
We first train several word embedding models on a large-scale unlabelled Arabic text corpus.
For each detection task, we train several neural network classifiers using the pre-trained word embedding models.
This task yields a large number of various learned models, which allows conducting an exhaustive comparison.
arXiv Detail & Related papers (2020-11-23T02:43:45Z) - "Did you really mean what you said?" : Sarcasm Detection in
Hindi-English Code-Mixed Data using Bilingual Word Embeddings [0.0]
We present a corpus of tweets for training custom word embeddings and a Hinglish dataset labelled for sarcasm detection.
We propose a deep learning based approach to address the issue of sarcasm detection in Hindi-English code mixed tweets.
arXiv Detail & Related papers (2020-10-01T11:41:44Z) - "Listen, Understand and Translate": Triple Supervision Decouples
End-to-end Speech-to-text Translation [49.610188741500274]
An end-to-end speech-to-text translation (ST) takes audio in a source language and outputs the text in a target language.
Existing methods are limited by the amount of parallel corpus.
We build a system to fully utilize signals in a parallel ST corpus.
arXiv Detail & Related papers (2020-09-21T09:19:07Z) - NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching
language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message.
We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z) - IIT Gandhinagar at SemEval-2020 Task 9: Code-Mixed Sentiment
Classification Using Candidate Sentence Generation and Selection [1.2301855531996841]
Code-mixing adds to the challenge of analyzing the sentiment of the text due to the non-standard writing style.
We present a candidate sentence generation and selection based approach on top of the Bi-LSTM based neural classifier.
The proposed approach shows an improvement in the system performance as compared to the Bi-LSTM based neural classifier.
arXiv Detail & Related papers (2020-06-25T14:59:47Z) - Unsupervised Cross-Modal Audio Representation Learning from Unstructured
Multilingual Text [69.55642178336953]
We present an approach to unsupervised audio representation learning.
Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness.
We show that our approach is invariant to the variety of annotation styles as well as to the different languages of this collection.
arXiv Detail & Related papers (2020-03-27T07:37:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.