Detection of Homophobia & Transphobia in Dravidian Languages: Exploring
Deep Learning Methods
- URL: http://arxiv.org/abs/2304.01241v1
- Date: Mon, 3 Apr 2023 12:15:27 GMT
- Title: Detection of Homophobia & Transphobia in Dravidian Languages: Exploring
Deep Learning Methods
- Authors: Deepawali Sharma, Vedika Gupta, Vivek Kumar Singh
- Abstract summary: Homophobia and transphobia constitute offensive comments against LGBT+ community.
The paper attempts to explore applicability of different deep learning mod-els for classification of the social media comments in Malayalam and Tamil lan-guages.
- Score: 1.5687561161428403
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increase in abusive content on online social media platforms is impacting
the social life of online users. Use of offensive and hate speech has been
making so-cial media toxic. Homophobia and transphobia constitute offensive
comments against LGBT+ community. It becomes imperative to detect and handle
these comments, to timely flag or issue a warning to users indulging in such
behaviour. However, automated detection of such content is a challenging task,
more so in Dravidian languages which are identified as low resource languages.
Motivated by this, the paper attempts to explore applicability of different
deep learning mod-els for classification of the social media comments in
Malayalam and Tamil lan-guages as homophobic, transphobic and
non-anti-LGBT+content. The popularly used deep learning models- Convolutional
Neural Network (CNN), Long Short Term Memory (LSTM) using GloVe embedding and
transformer-based learning models (Multilingual BERT and IndicBERT) are applied
to the classification problem. Results obtained show that IndicBERT outperforms
the other imple-mented models, with obtained weighted average F1-score of 0.86
and 0.77 for Malayalam and Tamil, respectively. Therefore, the present work
confirms higher performance of IndicBERT on the given task in selected
Dravidian languages.
Related papers
- The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - Breaking the Silence Detecting and Mitigating Gendered Abuse in Hindi, Tamil, and Indian English Online Spaces [0.6543929004971272]
Team CNLP-NITS-PP developed an ensemble approach combining CNN and BiLSTM networks.
CNN captures localized features indicative of abusive language through its convolution filters applied on embedded input text.
BiLSTM analyzes this sequence for dependencies among words and phrases.
validation scores showed strong performance across f1-measures, especially for English 0.84.
arXiv Detail & Related papers (2024-04-02T14:55:47Z) - Analysis and Detection of Multilingual Hate Speech Using Transformer
Based Deep Learning [7.332311991395427]
As the prevalence of hate speech increases online, the demand for automated detection as an NLP task is increasing.
In this work, the proposed method is using transformer-based model to detect hate speech in social media, like twitter, Facebook, WhatsApp, Instagram, etc.
The Gold standard datasets were collected from renowned researcher Zeerak Talat, Sara Tonelli, Melanie Siegel, and Rezaul Karim.
The success rate of the proposed model for hate speech detection is higher than the existing baseline and state-of-the-art models with accuracy in Bengali dataset is 89%, in English: 91%, in German
arXiv Detail & Related papers (2024-01-19T20:40:23Z) - Understanding writing style in social media with a supervised
contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation.
We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts.
Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z) - "I'm fully who I am": Towards Centering Transgender and Non-Binary
Voices to Measure Biases in Open Language Generation [69.25368160338043]
Transgender and non-binary (TGNB) individuals disproportionately experience discrimination and exclusion from daily life.
We assess how the social reality surrounding experienced marginalization of TGNB persons contributes to and persists within Open Language Generation.
We introduce TANGO, a dataset of template-based real-world text curated from a TGNB-oriented community.
arXiv Detail & Related papers (2023-05-17T04:21:45Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - bitsa_nlp@LT-EDI-ACL2022: Leveraging Pretrained Language Models for
Detecting Homophobia and Transphobia in Social Media Comments [0.9981479937152642]
We present our system for the LT-EDI shared task on detecting homophobia and transphobia in social media comments.
We experiment with a number of monolingual and multilingual transformer based models such as mBERT.
We observe their performance on a carefully annotated, real life dataset of YouTube comments in English as well as Tamil.
arXiv Detail & Related papers (2022-03-27T10:15:34Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Ceasing hate withMoH: Hate Speech Detection in Hindi-English
Code-Switched Language [2.9926023796813728]
This work focuses on analyzing hate speech in Hindi-English code-switched language.
To contain the structure of data, we developed MoH or Map Only Hindi, which means "Love" in Hindi.
MoH pipeline consists of language identification, Roman to Devanagari Hindi transliteration using a knowledge base of Roman Hindi words.
arXiv Detail & Related papers (2021-10-18T15:24:32Z) - Evaluation of Deep Learning Models for Hostility Detection in Hindi Text [2.572404739180802]
We present approaches for hostile text detection in the Hindi language.
The proposed approaches are evaluated on the Constraint@AAAI 2021 Hindi hostility detection dataset.
We evaluate a host of deep learning approaches based on CNN, LSTM, and BERT for this multi-label classification problem.
arXiv Detail & Related papers (2021-01-11T19:10:57Z) - Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media
during the COVID-19 Crisis [51.39895377836919]
COVID-19 has sparked racism and hate on social media targeted towards Asian communities.
We study the evolution and spread of anti-Asian hate speech through the lens of Twitter.
We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months.
arXiv Detail & Related papers (2020-05-25T21:58:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.