Coarse and Fine-Grained Hostility Detection in Hindi Posts using Fine
Tuned Multilingual Embeddings
- URL: http://arxiv.org/abs/2101.04998v1
- Date: Wed, 13 Jan 2021 11:00:31 GMT
- Title: Coarse and Fine-Grained Hostility Detection in Hindi Posts using Fine
Tuned Multilingual Embeddings
- Authors: Arkadipta De, Venkatesh E, Kaushal Kumar Maurya, Maunendra Sankar
Desarkar
- Abstract summary: The hostility detection task has been well explored for resource-rich languages like English, but is unexplored for resource-constrained languages like Hindidue to the unavailability of large suitable data.
We propose an effective neural network-based technique for hostility detection in Hindi posts.
- Score: 4.3012765978447565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the wide adoption of social media platforms like Facebook, Twitter,
etc., there is an emerging need of detecting online posts that can go against
the community acceptance standards. The hostility detection task has been well
explored for resource-rich languages like English, but is unexplored for
resource-constrained languages like Hindidue to the unavailability of large
suitable data. We view this hostility detection as a multi-label multi-class
classification problem. We propose an effective neural network-based technique
for hostility detection in Hindi posts. We leverage pre-trained multilingual
Bidirectional Encoder Representations of Transformer (mBERT) to obtain the
contextual representations of Hindi posts. We have performed extensive
experiments including different pre-processing techniques, pre-trained models,
neural architectures, hybrid strategies, etc. Our best performing neural
classifier model includes One-vs-the-Rest approach where we obtained 92.60%,
81.14%,69.59%, 75.29% and 73.01% F1 scores for hostile, fake, hate, offensive,
and defamation labels respectively. The proposed model outperformed the
existing baseline models and emerged as the state-of-the-art model for
detecting hostility in the Hindi posts.
Related papers
- Multilingual Bias Detection and Mitigation for Indian Languages [12.957036336552372]
Lack of diverse perspectives causes neutrality bias in Wikipedia content leading to millions of worldwide readers getting exposed.
We contribute two large datasets, mWikiBias and mWNC, covering 8 languages, for the bias detection and mitigation tasks respectively.
Next, we investigate the effectiveness of popular multilingual Transformer-based models for the two tasks by modeling detection as a binary classification problem and mitigation as a style transfer problem.
arXiv Detail & Related papers (2023-12-23T07:36:20Z) - ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for
Scene Text Spotting [121.11880210592497]
We argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input.
We propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting.
arXiv Detail & Related papers (2022-11-19T03:50:33Z) - No Language Left Behind: Scaling Human-Centered Machine Translation [69.28110770760506]
We create datasets and models aimed at narrowing the performance gap between low and high-resource languages.
We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks.
Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art.
arXiv Detail & Related papers (2022-07-11T07:33:36Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Role of Artificial Intelligence in Detection of Hateful Speech for
Hinglish Data on Social Media [1.8899300124593648]
Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world.
Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages.
We propose a methodology for efficient detection of unstructured code-mix Hinglish language.
arXiv Detail & Related papers (2021-05-11T10:02:28Z) - Read Like Humans: Autonomous, Bidirectional and Iterative Language
Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition.
How to effectively model linguistic rules in end-to-end deep networks remains a research challenge.
We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z) - Walk in Wild: An Ensemble Approach for Hostility Detection in Hindi
Posts [3.9373541926236766]
We develop a simple ensemble based model on pre-trained mBERT and popular classification algorithms like Artificial Neural Network (ANN) and XGBoost for hostility detection in Hindi posts.
We received third overall rank in the competition and weighted F1-scores of 0.969 and 0.61 on the binary and multi-label multi-class classification tasks respectively.
arXiv Detail & Related papers (2021-01-15T07:49:27Z) - Hostility Detection in Hindi leveraging Pre-Trained Language Models [1.6436293069942312]
This paper presents a transfer learning based approach to classify social media posts in Hindi Devanagari script as Hostile or Non-Hostile.
Hostile posts are further analyzed to determine if they are Hateful, Fake, Defamation, and Offensive.
We establish a robust and consistent model without any ensembling or complex pre-processing.
arXiv Detail & Related papers (2021-01-14T08:04:32Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.