Related papers: Exploratory Data Analysis on Code-mixed Misogynistic Comments

Exploratory Data Analysis on Code-mixed Misogynistic Comments

URL: http://arxiv.org/abs/2403.09709v1
Date: Sat, 9 Mar 2024 23:21:17 GMT
Title: Exploratory Data Analysis on Code-mixed Misogynistic Comments
Authors: Sargam Yadav, Abhishek Kaushik, Kevin McDaid,
Abstract summary: We present a novel dataset of YouTube comments in mix-code Hinglish. These comments have been weak labelled as Misogynistic' and Non-misogynistic'
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The problems of online hate speech and cyberbullying have significantly worsened since the increase in popularity of social media platforms such as YouTube and Twitter (X). Natural Language Processing (NLP) techniques have proven to provide a great advantage in automatic filtering such toxic content. Women are disproportionately more likely to be victims of online abuse. However, there appears to be a lack of studies that tackle misogyny detection in under-resourced languages. In this short paper, we present a novel dataset of YouTube comments in mix-code Hinglish collected from YouTube videos which have been weak labelled as `Misogynistic' and `Non-misogynistic'. Pre-processing and Exploratory Data Analysis (EDA) techniques have been applied on the dataset to gain insights on its characteristics. The process has provided a better understanding of the dataset through sentiment scores, word clouds, etc.

Related papers

Using psychological theory to ground guidelines for the annotation of misogynistic language [2.0391237204597368]
misogyny is on the rise both online and offline.<n>Current misogyny detection coding schemes and datasets fail to capture the ways women experience misogyny online.<n>We present a case study using Large Language Models (LLMs) to compare our coding scheme to a self-described "expert" misogyny annotation scheme in the literature.
arXiv Detail & Related papers (2026-01-24T11:29:46Z)
Clicks, comments, consequences: Are content creators' socio-structural and platform characteristics shaping the exposure to negative sentiment, offensive language, and hate speech on YouTube? [0.0]
This study investigates how socio-structural characteristics such as the age, gender, and race of CCs but also platform features play a role. We conduct a comprehensive analysis combining digital trace data, enhanced with hand-coded variables to include socio-structural characteristics in social media data. Contrary to existing studies our findings indicate that female content creators are confronted with less negative communication.
arXiv Detail & Related papers (2025-04-10T11:58:56Z)
GS_DravidianLangTech@2025: Women Targeted Abusive Texts Detection on Social Media [4.573779790701493]
Abusive speech refers to communication intended to harm or incite hatred against vulnerable individuals or groups. This paper focuses on detecting abusive texts targeting women on social media platforms.
arXiv Detail & Related papers (2025-04-01T00:00:07Z)
Breaking the Silence Detecting and Mitigating Gendered Abuse in Hindi, Tamil, and Indian English Online Spaces [0.6543929004971272]
Team CNLP-NITS-PP developed an ensemble approach combining CNN and BiLSTM networks. CNN captures localized features indicative of abusive language through its convolution filters applied on embedded input text. BiLSTM analyzes this sequence for dependencies among words and phrases. validation scores showed strong performance across f1-measures, especially for English 0.84.
arXiv Detail & Related papers (2024-04-02T14:55:47Z)
Anti-Sexism Alert System: Identification of Sexist Comments on Social Media Using AI Techniques [0.0]
Sexist comments that are publicly posted in social media (newspaper comments, social networks, etc.) usually obtain a lot of attention and become viral, with consequent damage to the persons involved. In this paper, we introduce an anti-sexism alert system, based on natural language processing (NLP) and artificial intelligence (AI) This system analyzes any public post, and decides if it could be considered a sexist comment or not.
arXiv Detail & Related papers (2023-11-28T19:48:46Z)
Subtle Misogyny Detection and Mitigation: An Expert-Annotated Dataset [5.528106559459623]
The Biasly dataset is built in collaboration with multi-disciplinary experts and annotators themselves. The dataset can be used for a range of NLP tasks, including classification, severity score regression, and text generation for rewrites.
arXiv Detail & Related papers (2023-11-15T23:27:19Z)
Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B. We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively. We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z)
Understanding writing style in social media with a supervised contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation. We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts. Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z)
Topological Data Mapping of Online Hate Speech, Misinformation, and General Mental Health: A Large Language Model Based Study [6.803493330690884]
Recent advances in machine learning and large language models have made such an analysis possible. In this study, we collected thousands of posts from carefully selected communities on the social media site Reddit. We performed various machine-learning classifications based on embeddings in order to understand the role of hate speech/misinformation in various communities.
arXiv Detail & Related papers (2023-09-22T15:10:36Z)
CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations. We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z)
Countering Malicious Content Moderation Evasion in Online Social Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems. This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z)
Trawling for Trolling: A Dataset [56.1778095945542]
We present a dataset that models trolling as a subcategory of offensive content. The dataset has 12,490 samples, split across 5 classes; Normal, Profanity, Trolling, Derogatory and Hate Speech.
arXiv Detail & Related papers (2020-08-02T17:23:55Z)
Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media during the COVID-19 Crisis [51.39895377836919]
COVID-19 has sparked racism and hate on social media targeted towards Asian communities. We study the evolution and spread of anti-Asian hate speech through the lens of Twitter. We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months.
arXiv Detail & Related papers (2020-05-25T21:58:09Z)
Developing a Multilingual Annotated Corpus of Misogyny and Aggression [1.0187588674939276]
We discuss the development of a multilingual annotated corpus of misogyny and aggression in Indian English, Hindi, and Indian Bangla. The dataset is collected from comments on YouTube videos and currently contains a total of over 20,000 comments.
arXiv Detail & Related papers (2020-03-16T20:19:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.