WOLI at SemEval-2020 Task 12: Arabic Offensive Language Identification
on Different Twitter Datasets
- URL: http://arxiv.org/abs/2009.05456v1
- Date: Fri, 11 Sep 2020 14:10:03 GMT
- Title: WOLI at SemEval-2020 Task 12: Arabic Offensive Language Identification
on Different Twitter Datasets
- Authors: Yasser Otiefy (WideBot), Ahmed Abdelmalek (WideBot), Islam El Hosary
(WideBot)
- Abstract summary: A key to fight offensive language on social media is the existence of an automatic offensive language detection system.
In this paper, we describe the system submitted by WideBot AI Lab for the shared task which ranked 10th out of 52 participants with Macro-F1 86.9%.
We also introduced a neural network approach that enhanced the predictive ability of our system that includes CNN, highway network, Bi-LSTM, and attention layers.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Communicating through social platforms has become one of the principal means
of personal communications and interactions. Unfortunately, healthy
communication is often interfered by offensive language that can have damaging
effects on the users. A key to fight offensive language on social media is the
existence of an automatic offensive language detection system. This paper
presents the results and the main findings of SemEval-2020, Task 12 OffensEval
Sub-task A Zampieri et al. (2020), on Identifying and categorising Offensive
Language in Social Media. The task was based on the Arabic OffensEval dataset
Mubarak et al. (2020). In this paper, we describe the system submitted by
WideBot AI Lab for the shared task which ranked 10th out of 52 participants
with Macro-F1 86.9% on the golden dataset under CodaLab username
"yasserotiefy". We experimented with various models and the best model is a
linear SVM in which we use a combination of both character and word n-grams. We
also introduced a neural network approach that enhanced the predictive ability
of our system that includes CNN, highway network, Bi-LSTM, and attention
layers.
Related papers
- Social Support Detection from Social Media Texts [44.096359084699]
Social support, conveyed through a multitude of interactions and platforms such as social media, plays a pivotal role in fostering a sense of belonging.
This paper introduces Social Support Detection (SSD) as a Natural language processing (NLP) task aimed at identifying supportive interactions.
We conducted experiments on a dataset comprising 10,000 YouTube comments.
arXiv Detail & Related papers (2024-11-04T20:23:03Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Overview of Abusive and Threatening Language Detection in Urdu at FIRE
2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language.
We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening.
For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z) - You Don't Know My Favorite Color: Preventing Dialogue Representations
from Revealing Speakers' Private Personas [44.82330540456883]
We show that speakers' personas can be inferred through a simple neural network with high accuracy.
We conduct extensive experiments to demonstrate that our proposed defense objectives can greatly reduce the attack accuracy from 37.6% to 0.5%.
arXiv Detail & Related papers (2022-04-26T09:36:18Z) - Training Conversational Agents with Generative Conversational Networks [74.9941330874663]
We use Generative Conversational Networks to automatically generate data and train social conversational agents.
We evaluate our approach on TopicalChat with automatic metrics and human evaluators, showing that with 10% of seed data it performs close to the baseline that uses 100% of the data.
arXiv Detail & Related papers (2021-10-15T21:46:39Z) - Identifying and Categorizing Offensive Language in Social Media [0.0]
This study provides a description of a classification system built for SemEval 2019 Task 6: OffensEval.
We trained machine learning and deep learning models along with data preprocessing and sampling techniques to come up with the best results.
arXiv Detail & Related papers (2021-04-10T22:53:43Z) - UPB at SemEval-2020 Task 12: Multilingual Offensive Language Detection
on Social Media by Fine-tuning a Variety of BERT-based Models [0.0]
This paper describes our Transformer-based solutions for identifying offensive language on Twitter in five languages.
It was employed in Subtask A of the Offenseval 2020 shared task.
arXiv Detail & Related papers (2020-10-26T14:28:29Z) - BRUMS at SemEval-2020 Task 12 : Transformer based Multilingual Offensive
Language Identification in Social Media [9.710464466895521]
We present a multilingual deep learning model to identify offensive language in social media.
The approach achieves acceptable evaluation scores, while maintaining flexibility between languages.
arXiv Detail & Related papers (2020-10-13T10:39:14Z) - Garain at SemEval-2020 Task 12: Sequence based Deep Learning for
Categorizing Offensive Language in Social Media [3.236217153362305]
SemEval-2020 Task 12 was OffenseEval: Multilingual Offensive Language Identification in Social Media.
My system on training on 25% of the whole dataset macro averaged f1 score of 47.763%.
arXiv Detail & Related papers (2020-09-02T17:09:29Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.