Walk in Wild: An Ensemble Approach for Hostility Detection in Hindi
Posts
- URL: http://arxiv.org/abs/2101.06004v1
- Date: Fri, 15 Jan 2021 07:49:27 GMT
- Title: Walk in Wild: An Ensemble Approach for Hostility Detection in Hindi
Posts
- Authors: Chander Shekhar, Bhavya Bagla, Kaushal Kumar Maurya, Maunendra Sankar
Desarkar
- Abstract summary: We develop a simple ensemble based model on pre-trained mBERT and popular classification algorithms like Artificial Neural Network (ANN) and XGBoost for hostility detection in Hindi posts.
We received third overall rank in the competition and weighted F1-scores of 0.969 and 0.61 on the binary and multi-label multi-class classification tasks respectively.
- Score: 3.9373541926236766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the reach of the internet increases, pejorative terms started flooding
over social media platforms. This leads to the necessity of identifying hostile
content on social media platforms. Identification of hostile contents on
low-resource languages like Hindi poses different challenges due to its diverse
syntactic structure compared to English. In this paper, we develop a simple
ensemble based model on pre-trained mBERT and popular classification algorithms
like Artificial Neural Network (ANN) and XGBoost for hostility detection in
Hindi posts. We formulated this problem as binary classification (hostile and
non-hostile class) and multi-label multi-class classification problem (for more
fine-grained hostile classes). We received third overall rank in the
competition and weighted F1-scores of ~0.969 and ~0.61 on the binary and
multi-label multi-class classification tasks respectively.
Related papers
- Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - JointMatch: A Unified Approach for Diverse and Collaborative
Pseudo-Labeling to Semi-Supervised Text Classification [65.268245109828]
Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data.
Existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation.
We propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning.
arXiv Detail & Related papers (2023-10-23T05:43:35Z) - Subsidiary Prototype Alignment for Universal Domain Adaptation [58.431124236254]
A major problem in Universal Domain Adaptation (UniDA) is misalignment of "known" and "unknown" classes.
We propose a novel word-histogram-related pretext task to enable closed-set SPA, operating in conjunction with goal task UniDA.
We demonstrate the efficacy of our approach on top of existing UniDA techniques, yielding state-of-the-art performance across three standard UniDA and Open-Set DA object recognition benchmarks.
arXiv Detail & Related papers (2022-10-28T05:32:14Z) - Divide and Conquer: An Ensemble Approach for Hostile Post Detection in
Hindi [25.723773314371947]
The data for this task is provided in Hindi Devanagari script which was collected from Twitter and Facebook.
It is a multi-label multi-class classification problem where each data instance is annotated into one or more of the five classes: fake, hate, offensive, defamation, and non-hostile.
Our team 'Albatross', scored 0.9709 Coarse grained hostility F1 score measure on Hostile Post Detection in Hindi subtask and secured 2nd rank out of 45 teams for the task.
arXiv Detail & Related papers (2021-01-20T05:38:07Z) - Coarse and Fine-Grained Hostility Detection in Hindi Posts using Fine
Tuned Multilingual Embeddings [4.3012765978447565]
The hostility detection task has been well explored for resource-rich languages like English, but is unexplored for resource-constrained languages like Hindidue to the unavailability of large suitable data.
We propose an effective neural network-based technique for hostility detection in Hindi posts.
arXiv Detail & Related papers (2021-01-13T11:00:31Z) - Task Adaptive Pretraining of Transformers for Hostility Detection [11.306581296760864]
We study two problems, namely, (a) Coarse binary classification of Hindi Tweets into Hostile or Not, and (b) Fine-grained multi-label classification of Tweets into four categories: hate, fake, offensive, and defamation.
Our system ranked first in the 'Hostile Post Detection in Hindi' shared task with an F1 score of 97.16% for coarse-grained detection and a weighted F1 score of 62.96% for fine-grained multi-label classification on the provided blind test corpora.
arXiv Detail & Related papers (2021-01-09T15:45:26Z) - kk2018 at SemEval-2020 Task 9: Adversarial Training for Code-Mixing
Sentiment Classification [18.41476971318978]
Code switching is a linguistic phenomenon that may occur within a multilingual setting where speakers share more than one language.
In this work, the domain transfer learning from state-of-the-art uni-language model ERNIE is tested on the code-mixing dataset.
adversarial training with a multi-lingual model is used to achieve 1st place of SemEval-2020 Task 9 Hindi-English sentiment classification competition.
arXiv Detail & Related papers (2020-09-08T12:20:04Z) - NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching
language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message.
We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z) - Symbiotic Adversarial Learning for Attribute-based Person Search [86.7506832053208]
We present a symbiotic adversarial learning framework, called SAL.Two GANs sit at the base of the framework in a symbiotic learning scheme.
Specifically, two different types of generative adversarial networks learn collaboratively throughout the training process.
arXiv Detail & Related papers (2020-07-19T07:24:45Z) - "Hinglish" Language -- Modeling a Messy Code-Mixed Language [0.0]
This project focuses on using deep learning techniques to tackle a classification problem in categorizing social content written in Hindi-English into Abusive, Hate-Inducing and Not offensive categories.
We utilize bi-directional sequence models with easy text augmentation techniques such as synonym replacement, random insertion, random swap, and random deletion.
arXiv Detail & Related papers (2019-12-30T23:01:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.