GS_DravidianLangTech@2025: Women Targeted Abusive Texts Detection on Social Media
- URL: http://arxiv.org/abs/2504.02863v1
- Date: Tue, 01 Apr 2025 00:00:07 GMT
- Title: GS_DravidianLangTech@2025: Women Targeted Abusive Texts Detection on Social Media
- Authors: Girma Yohannis Bade, Zahra Ahani, Olga Kolesnikova, José Luis Oropeza, Grigori Sidorov,
- Abstract summary: Abusive speech refers to communication intended to harm or incite hatred against vulnerable individuals or groups.<n>This paper focuses on detecting abusive texts targeting women on social media platforms.
- Score: 4.573779790701493
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing misuse of social media has become a concern; however, technological solutions are being developed to moderate its content effectively. This paper focuses on detecting abusive texts targeting women on social media platforms. Abusive speech refers to communication intended to harm or incite hatred against vulnerable individuals or groups. Specifically, this study aims to identify abusive language directed toward women. To achieve this, we utilized logistic regression and BERT as base models to train datasets sourced from DravidianLangTech@2025 for Tamil and Malayalam languages. The models were evaluated on test datasets, resulting in a 0.729 macro F1 score for BERT and 0.6279 for logistic regression in Tamil and Malayalam, respectively.
Related papers
- Breaking the Silence Detecting and Mitigating Gendered Abuse in Hindi, Tamil, and Indian English Online Spaces [0.6543929004971272]
Team CNLP-NITS-PP developed an ensemble approach combining CNN and BiLSTM networks.
CNN captures localized features indicative of abusive language through its convolution filters applied on embedded input text.
BiLSTM analyzes this sequence for dependencies among words and phrases.
validation scores showed strong performance across f1-measures, especially for English 0.84.
arXiv Detail & Related papers (2024-04-02T14:55:47Z) - Exploratory Data Analysis on Code-mixed Misogynistic Comments [0.0]
We present a novel dataset of YouTube comments in mix-code Hinglish.
These comments have been weak labelled as Misogynistic' and Non-misogynistic'
arXiv Detail & Related papers (2024-03-09T23:21:17Z) - Analysis and Detection of Multilingual Hate Speech Using Transformer
Based Deep Learning [7.332311991395427]
As the prevalence of hate speech increases online, the demand for automated detection as an NLP task is increasing.
In this work, the proposed method is using transformer-based model to detect hate speech in social media, like twitter, Facebook, WhatsApp, Instagram, etc.
The Gold standard datasets were collected from renowned researcher Zeerak Talat, Sara Tonelli, Melanie Siegel, and Rezaul Karim.
The success rate of the proposed model for hate speech detection is higher than the existing baseline and state-of-the-art models with accuracy in Bengali dataset is 89%, in English: 91%, in German
arXiv Detail & Related papers (2024-01-19T20:40:23Z) - Understanding writing style in social media with a supervised
contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation.
We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts.
Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z) - KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large
Language Model Application [45.3863281375947]
Large language models (LLMs) learn not only natural text generation abilities but also social biases against different demographic groups from real-world data.
Existing research and resources are not readily applicable in South Korea due to the differences in language and culture.
We present KO SB I, a new social bias dataset of 34k pairs of contexts and sentences in Korean covering 72 demographic groups in 15 categories.
arXiv Detail & Related papers (2023-05-28T12:07:16Z) - Detection of Homophobia & Transphobia in Dravidian Languages: Exploring
Deep Learning Methods [1.5687561161428403]
Homophobia and transphobia constitute offensive comments against LGBT+ community.
The paper attempts to explore applicability of different deep learning mod-els for classification of the social media comments in Malayalam and Tamil lan-guages.
arXiv Detail & Related papers (2023-04-03T12:15:27Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Overview of Abusive and Threatening Language Detection in Urdu at FIRE
2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language.
We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening.
For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z) - COLD: A Benchmark for Chinese Offensive Language Detection [54.60909500459201]
We use COLDataset, a Chinese offensive language dataset with 37k annotated sentences.
We also propose textscCOLDetector to study output offensiveness of popular Chinese language models.
Our resources and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models.
arXiv Detail & Related papers (2022-01-16T11:47:23Z) - Abusive and Threatening Language Detection in Urdu using Boosting based
and BERT based models: A Comparative Approach [0.0]
In this paper, we explore several machine learning models for abusive and threatening content detection in Urdu based on the shared task.
Our model came First for both abusive and threatening content detection with an F1scoreof 0.88 and 0.54, respectively.
arXiv Detail & Related papers (2021-11-27T20:03:19Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.