Divide and Conquer: An Ensemble Approach for Hostile Post Detection in
Hindi
- URL: http://arxiv.org/abs/2101.07973v1
- Date: Wed, 20 Jan 2021 05:38:07 GMT
- Title: Divide and Conquer: An Ensemble Approach for Hostile Post Detection in
Hindi
- Authors: Varad Bhatnagar, Prince Kumar, Sairam Moghili and Pushpak
Bhattacharyya
- Abstract summary: The data for this task is provided in Hindi Devanagari script which was collected from Twitter and Facebook.
It is a multi-label multi-class classification problem where each data instance is annotated into one or more of the five classes: fake, hate, offensive, defamation, and non-hostile.
Our team 'Albatross', scored 0.9709 Coarse grained hostility F1 score measure on Hostile Post Detection in Hindi subtask and secured 2nd rank out of 45 teams for the task.
- Score: 25.723773314371947
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently the NLP community has started showing interest towards the
challenging task of Hostile Post Detection. This paper present our system for
Shared Task at Constraint2021 on "Hostile Post Detection in Hindi". The data
for this shared task is provided in Hindi Devanagari script which was collected
from Twitter and Facebook. It is a multi-label multi-class classification
problem where each data instance is annotated into one or more of the five
classes: fake, hate, offensive, defamation, and non-hostile. We propose a two
level architecture which is made up of BERT based classifiers and statistical
classifiers to solve this problem. Our team 'Albatross', scored 0.9709 Coarse
grained hostility F1 score measure on Hostile Post Detection in Hindi subtask
and secured 2nd rank out of 45 teams for the task. Our submission is ranked 2nd
and 3rd out of a total of 156 submissions with Coarse grained hostility F1
score of 0.9709 and 0.9703 respectively. Our fine grained scores are also very
encouraging and can be improved with further finetuning. The code is publicly
available.
Related papers
- ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2020 [62.6928395368204]
Task was posed as a binary classification task, in which the goal is to differentiate between real and fake news.
We provided a dataset divided into 900 annotated news articles for training and 400 news articles for testing.
42 teams from 6 different countries (India, China, Egypt, Germany, Pakistan, and the UK) registered for the task.
arXiv Detail & Related papers (2022-07-25T03:41:32Z) - Overview of Abusive and Threatening Language Detection in Urdu at FIRE
2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language.
We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening.
For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z) - UrduFake@FIRE2021: Shared Track on Fake News Identification in Urdu [55.41644538483948]
This study reports the second shared task named as UrduFake@FIRE2021 on identifying fake news detection in Urdu language.
The proposed systems were based on various count-based features and used different classifiers as well as neural network architectures.
The gradient descent (SGD) algorithm outperformed other classifiers and achieved 0.679 F-score.
arXiv Detail & Related papers (2022-07-11T19:15:04Z) - Walk in Wild: An Ensemble Approach for Hostility Detection in Hindi
Posts [3.9373541926236766]
We develop a simple ensemble based model on pre-trained mBERT and popular classification algorithms like Artificial Neural Network (ANN) and XGBoost for hostility detection in Hindi posts.
We received third overall rank in the competition and weighted F1-scores of 0.969 and 0.61 on the binary and multi-label multi-class classification tasks respectively.
arXiv Detail & Related papers (2021-01-15T07:49:27Z) - Hostility Detection in Hindi leveraging Pre-Trained Language Models [1.6436293069942312]
This paper presents a transfer learning based approach to classify social media posts in Hindi Devanagari script as Hostile or Non-Hostile.
Hostile posts are further analyzed to determine if they are Hateful, Fake, Defamation, and Offensive.
We establish a robust and consistent model without any ensembling or complex pre-processing.
arXiv Detail & Related papers (2021-01-14T08:04:32Z) - LaDiff ULMFiT: A Layer Differentiated training approach for ULMFiT [0.0]
We propose a Layer Differentiated training procedure for training a pre-trained ULMFiT arXiv:1801.06146 model.
We used special tokens to annotate specific parts of the tweets to improve language understanding and gain insights on the model.
The proposed approach ranked 61st out of 164 in the sub-task "COVID19 Fake News Detection in English"
arXiv Detail & Related papers (2021-01-13T09:52:04Z) - Detecting Hostile Posts using Relational Graph Convolutional Network [1.8734449181723827]
This work is based on the submission to competition conducted by AAAI@2021 for detection of hostile posts in Hindi on social media platforms.
Here, a model is presented for classification of hostile posts using Convolutional Networks.
The proposed model is performing at par with Google's XLM-RoBERTa on the given dataset.
Among all submissions to the challenge, our classification system with XLMRoberta secured 2nd rank on fine-grained classification.
arXiv Detail & Related papers (2021-01-10T06:50:22Z) - Combating Hostility: Covid-19 Fake News and Hostile Post Detection in
Social Media [0.0]
This paper illustrates a detail description of the system and its results that developed as a part of the participation at CONSTRAINT shared task in AAAI-2021.
Various techniques are used to perform the classification task, including SVM, CNN, BiLSTM, and CNN+BiLSTM with tf-idf and Word2Vec embedding techniques.
arXiv Detail & Related papers (2021-01-09T05:15:41Z) - Hostility Detection Dataset in Hindi [44.221862384125245]
We collect and manually annotate 8200 online posts in Hindi language.
The dataset is considered for multi-label tags due to a significant overlap among the hostile classes.
arXiv Detail & Related papers (2020-11-06T20:33:12Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.