An Under-Explored Application for Explainable Multimodal Misogyny Detection in code-mixed Hindi-English
- URL: http://arxiv.org/abs/2601.08457v1
- Date: Tue, 13 Jan 2026 11:31:55 GMT
- Title: An Under-Explored Application for Explainable Multimodal Misogyny Detection in code-mixed Hindi-English
- Authors: Sargam Yadav, Abhishek Kaushik, Kevin Mc Daid,
- Abstract summary: We present a multi-modal and explainable web application for detecting misogyny in text and memes in code-mixed languages.<n>The application aims to serve as a tool for both researchers and content moderators, to promote further research in the field.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Digital platforms have an ever-expanding user base, and act as a hub for communication, business, and connectivity. However, this has also allowed for the spread of hate speech and misogyny. Artificial intelligence models have emerged as an effective solution for countering online hate speech but are under explored for low resource and code-mixed languages and suffer from a lack of interpretability. Explainable Artificial Intelligence (XAI) can enhance transparency in the decisions of deep learning models, which is crucial for a sensitive domain such as hate speech detection. In this paper, we present a multi-modal and explainable web application for detecting misogyny in text and memes in code-mixed Hindi and English. The system leverages state-of-the-art transformer-based models that support multilingual and multimodal settings. For text-based misogyny identification, the system utilizes XLM-RoBERTa (XLM-R) and multilingual Bidirectional Encoder Representations from Transformers (mBERT) on a dataset of approximately 4,193 comments. For multimodal misogyny identification from memes, the system utilizes mBERT + EfficientNet, and mBERT + ResNET trained on a dataset of approximately 4,218 memes. It also provides feature importance scores using explainability techniques including Shapley Additive Values (SHAP) and Local Interpretable Model Agnostic Explanations (LIME). The application aims to serve as a tool for both researchers and content moderators, to promote further research in the field, combat gender based digital violence, and ensure a safe digital space. The system has been evaluated using human evaluators who provided their responses on Chatbot Usability Questionnaire (CUQ) and User Experience Questionnaire (UEQ) to determine overall usability.
Related papers
- Boosting Accuracy and Interpretability in Multilingual Hate Speech Detection Through Layer Freezing and Explainable AI [0.0]
This study examines the performance of three transformer-based models for sentiment analysis and hate speech detection.<n>The evaluation is conducted across five languages: English, Korean, Japanese, Chinese, and French.<n>We integrate the Local Interpretable Model-agnostic Explanations (LIME) framework to highlight the contribution of individual words to the models decisions.
arXiv Detail & Related papers (2026-01-06T04:07:51Z) - Identifying & Interactively Refining Ambiguous User Goals for Data Visualization Code Generation [48.63200319578052]
We develop a taxonomy of types of ambiguity that arise in this task and propose metrics to quantify them.<n>Our work also explores how multi-turn dialogue can reduce ambiguity, therefore, improve code accuracy by better matching user goals.
arXiv Detail & Related papers (2025-10-10T13:44:40Z) - Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study [59.30098850050971]
This work evaluates LLM prompting-based detection across eight non-English languages.<n>We show that while zero-shot and few-shot prompting lag behind fine-tuned encoder models on most of the real-world evaluation sets, they achieve better generalization on functional tests for hate speech detection.
arXiv Detail & Related papers (2025-05-09T16:00:01Z) - A Unified Multi-Task Learning Architecture for Hate Detection Leveraging User-Based Information [23.017068553977982]
Hate speech, offensive language, aggression, racism, sexism, and other abusive language are common phenomena in social media.
There is a need for Artificial Intelligence(AI)based intervention which can filter hate content at scale.
This paper introduces a unique model that improves hate speech identification for the English language by utilising intra-user and inter-user-based information.
arXiv Detail & Related papers (2024-11-11T10:37:11Z) - Breaking the Silence Detecting and Mitigating Gendered Abuse in Hindi, Tamil, and Indian English Online Spaces [0.6543929004971272]
Team CNLP-NITS-PP developed an ensemble approach combining CNN and BiLSTM networks.
CNN captures localized features indicative of abusive language through its convolution filters applied on embedded input text.
BiLSTM analyzes this sequence for dependencies among words and phrases.
validation scores showed strong performance across f1-measures, especially for English 0.84.
arXiv Detail & Related papers (2024-04-02T14:55:47Z) - M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection [69.41274756177336]
Large Language Models (LLMs) have brought an unprecedented surge in machine-generated text (MGT) across diverse channels.
This raises legitimate concerns about its potential misuse and societal implications.
We introduce a new benchmark based on a multilingual, multi-domain, and multi-generator corpus of MGTs -- M4GT-Bench.
arXiv Detail & Related papers (2024-02-17T02:50:33Z) - Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual
Predatory Chats and Abusive Texts [2.406214748890827]
This paper proposes an approach to detection of online sexual predatory chats and abusive language using the open-source pretrained Llama 2 7B- parameter model.
We fine-tune the LLM using datasets with different sizes, imbalance degrees, and languages (i.e., English, Roman Urdu and Urdu)
Experimental results show a strong performance of the proposed approach, which performs proficiently and consistently across three distinct datasets.
arXiv Detail & Related papers (2023-08-28T16:18:50Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - LaMDA: Language Models for Dialog Applications [75.75051929981933]
LaMDA is a family of Transformer-based neural language models specialized for dialog.
Fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements.
arXiv Detail & Related papers (2022-01-20T15:44:37Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.