Garain at SemEval-2020 Task 12: Sequence based Deep Learning for
Categorizing Offensive Language in Social Media
- URL: http://arxiv.org/abs/2009.01195v1
- Date: Wed, 2 Sep 2020 17:09:29 GMT
- Title: Garain at SemEval-2020 Task 12: Sequence based Deep Learning for
Categorizing Offensive Language in Social Media
- Authors: Avishek Garain
- Abstract summary: SemEval-2020 Task 12 was OffenseEval: Multilingual Offensive Language Identification in Social Media.
My system on training on 25% of the whole dataset macro averaged f1 score of 47.763%.
- Score: 3.236217153362305
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: SemEval-2020 Task 12 was OffenseEval: Multilingual Offensive Language
Identification in Social Media (Zampieri et al., 2020). The task was subdivided
into multiple languages and datasets were provided for each one. The task was
further divided into three sub-tasks: offensive language identification,
automatic categorization of offense types, and offense target identification. I
have participated in the task-C, that is, offense target identification. For
preparing the proposed system, I have made use of Deep Learning networks like
LSTMs and frameworks like Keras which combine the bag of words model with
automatically generated sequence based features and manually extracted features
from the given dataset. My system on training on 25% of the whole dataset
achieves macro averaged f1 score of 47.763%.
Related papers
- DeMuX: Data-efficient Multilingual Learning [57.37123046817781]
DEMUX is a framework that prescribes exact data-points to label from vast amounts of unlabelled multilingual data.
Our end-to-end framework is language-agnostic, accounts for model representations, and supports multilingual target configurations.
arXiv Detail & Related papers (2023-11-10T20:09:08Z) - Navya3DSeg -- Navya 3D Semantic Segmentation Dataset & split generation
for autonomous vehicles [63.20765930558542]
3D semantic data are useful for core perception tasks such as obstacle detection and ego-vehicle localization.
We propose a new dataset, Navya 3D (Navya3DSeg), with a diverse label space corresponding to a large scale production grade operational domain.
It contains 23 labeled sequences and 25 supplementary sequences without labels, designed to explore self-supervised and semi-supervised semantic segmentation benchmarks on point clouds.
arXiv Detail & Related papers (2023-02-16T13:41:19Z) - MIA 2022 Shared Task: Evaluating Cross-lingual Open-Retrieval Question
Answering for 16 Diverse Languages [54.002969723086075]
We evaluate cross-lingual open-retrieval question answering systems in 16 typologically diverse languages.
The best system leveraging iteratively mined diverse negative examples achieves 32.2 F1, outperforming our baseline by 4.5 points.
The second best system uses entity-aware contextualized representations for document retrieval, and achieves significant improvements in Tamil (20.8 F1), whereas most of the other systems yield nearly zero scores.
arXiv Detail & Related papers (2022-07-02T06:54:10Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Generative Conversational Networks [67.13144697969501]
We propose a framework called Generative Conversational Networks, in which conversational agents learn to generate their own labelled training data.
We show an average improvement of 35% in intent detection and 21% in slot tagging over a baseline model trained from the seed data.
arXiv Detail & Related papers (2021-06-15T23:19:37Z) - Identifying and Categorizing Offensive Language in Social Media [0.0]
This study provides a description of a classification system built for SemEval 2019 Task 6: OffensEval.
We trained machine learning and deep learning models along with data preprocessing and sampling techniques to come up with the best results.
arXiv Detail & Related papers (2021-04-10T22:53:43Z) - Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive
Language Identification using Pre-trained Language Models [11.868582973877626]
This paper describes Galileo's performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media.
For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language Models, ERNIE and XLM-R.
For offensive language categorization, we proposed a knowledge distillation method trained on soft labels generated by several supervised models.
arXiv Detail & Related papers (2020-10-07T17:40:19Z) - WOLI at SemEval-2020 Task 12: Arabic Offensive Language Identification
on Different Twitter Datasets [0.0]
A key to fight offensive language on social media is the existence of an automatic offensive language detection system.
In this paper, we describe the system submitted by WideBot AI Lab for the shared task which ranked 10th out of 52 participants with Macro-F1 86.9%.
We also introduced a neural network approach that enhanced the predictive ability of our system that includes CNN, highway network, Bi-LSTM, and attention layers.
arXiv Detail & Related papers (2020-09-11T14:10:03Z) - Duluth at SemEval-2020 Task 12: Offensive Tweet Identification in
English with Logistic Regression [0.0]
This paper describes the systems that participated in Duluth SemEval--2020 Task 12, Multilingual Offensive Language Identification in Social Media (OffensEval--2020).
We trained our models on the distantly supervised training data made available by the task organizers and used no other resources.
As might be expected we did not rank highly in the comparative evaluation: 79th of 85 in Task A, 34th of 43 in Task B, and 24th of 39 in Task C.
arXiv Detail & Related papers (2020-07-25T14:49:31Z) - SemEval-2020 Task 12: Multilingual Offensive Language Identification in
Social Media (OffensEval 2020) [33.66689662526814]
We present the results and main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2020)
OffensEval 2020 was one of the most popular tasks at SemEval-2020 attracting a large number of participants across all subtasks and also across all languages.
A total of 528 teams signed up to participate in the task, 145 teams submitted systems during the evaluation period, and 70 submitted system description papers.
arXiv Detail & Related papers (2020-06-12T14:39:40Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.