NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic
Offensive Language Detection in Arabic Tweets
- URL: http://arxiv.org/abs/2007.13339v1
- Date: Mon, 27 Jul 2020 07:44:00 GMT
- Title: NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic
Offensive Language Detection in Arabic Tweets
- Authors: Hamada A. Nayel
- Abstract summary: The proposed system aims to automatically identify the Offensive Language in Arabic Tweets.
A machine learning based approach has been used to design our system.
The best performed system and the system in the last rank reported 90.17% and 44.51% f1-score on test set respectively.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present the system submitted to "SemEval-2020 Task 12". The
proposed system aims at automatically identify the Offensive Language in Arabic
Tweets. A machine learning based approach has been used to design our system.
We implemented a linear classifier with Stochastic Gradient Descent (SGD) as
optimization algorithm. Our model reported 84.20%, 81.82% f1-score on
development set and test set respectively. The best performed system and the
system in the last rank reported 90.17% and 44.51% f1-score on test set
respectively.
Related papers
- Exploiting prompt learning with pre-trained language models for
Alzheimer's Disease detection [70.86672569101536]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care and to delay further progression.
This paper investigates the use of prompt-based fine-tuning of PLMs that consistently uses AD classification errors as the training objective function.
arXiv Detail & Related papers (2022-10-29T09:18:41Z) - Using Sampling to Estimate and Improve Performance of Automated Scoring
Systems with Guarantees [63.62448343531963]
We propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently.
We observe significant gains in accuracy (19.80% increase on average) and quadratic weighted kappa (QWK) (25.60% on average) with a relatively small human budget.
arXiv Detail & Related papers (2021-11-17T05:00:51Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - Get It Scored Using AutoSAS -- An Automated System for Scoring Short
Answers [63.835172924290326]
We present a fast, scalable, and accurate approach towards automated Short Answer Scoring (SAS)
We propose and explain the design and development of a system for SAS, namely AutoSAS.
AutoSAS shows state-of-the-art performance and achieves better results by over 8% in some of the question prompts.
arXiv Detail & Related papers (2020-12-21T10:47:30Z) - Phonemer at WNUT-2020 Task 2: Sequence Classification Using COVID
Twitter BERT and Bagging Ensemble Technique based on Plurality Voting [0.0]
We develop a system that automatically identifies whether an English Tweet related to the novel coronavirus (COVID-19) is informative or not.
Our final approach achieved an F1-score of 0.9037 and we were ranked sixth overall with F1-score as the evaluation criteria.
arXiv Detail & Related papers (2020-10-01T10:54:54Z) - LynyrdSkynyrd at WNUT-2020 Task 2: Semi-Supervised Learning for
Identification of Informative COVID-19 English Tweets [4.361526134899725]
We describe our system for WNUT-2020 shared task on the identification of informative COVID-19 English tweets.
Our system is an ensemble of various machine learning methods, leveraging both traditional feature-based classifiers as well as recent advances in pre-trained language models.
Our best performing model achieves an F1-score of 0.9179 on the provided validation set and 0.8805 on the blind test-set.
arXiv Detail & Related papers (2020-09-08T16:29:25Z) - Garain at SemEval-2020 Task 12: Sequence based Deep Learning for
Categorizing Offensive Language in Social Media [3.236217153362305]
SemEval-2020 Task 12 was OffenseEval: Multilingual Offensive Language Identification in Social Media.
My system on training on 25% of the whole dataset macro averaged f1 score of 47.763%.
arXiv Detail & Related papers (2020-09-02T17:09:29Z) - Decision Tree J48 at SemEval-2020 Task 9: Sentiment Analysis for
Code-Mixed Social Media Text (Hinglish) [3.007778295477907]
This system uses Weka as a tool for providing the classifier for the classification of tweets.
python is used for loading the data from the files provided and cleaning it.
The system performance was assessed using the official competition evaluation metric F1-score.
arXiv Detail & Related papers (2020-08-26T06:30:43Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z) - Hitachi at SemEval-2020 Task 12: Offensive Language Identification with
Noisy Labels using Statistical Sampling and Post-Processing [13.638230797979917]
We present our participation in SemEval-2020 Task-12 Subtask-A (English Language) which focuses on offensive language identification from noisy labels.
We developed a hybrid system with the BERT classifier trained with tweets selected using Statistical Sampling Algorithm (SA) and Post-Processed (PP) using an offensive wordlist.
Our developed system achieved 34 th position with Macro-averaged F1-score (Macro-F1) of 0.90913 over both offensive and non-offensive classes.
arXiv Detail & Related papers (2020-05-01T10:16:40Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.