Related papers: AIWizards at MULTIPRIDE: A Hierarchical Approach to Slur Reclamation Detection

AIWizards at MULTIPRIDE: A Hierarchical Approach to Slur Reclamation Detection

URL: http://arxiv.org/abs/2602.12818v1
Date: Fri, 13 Feb 2026 11:01:19 GMT
Title: AIWizards at MULTIPRIDE: A Hierarchical Approach to Slur Reclamation Detection
Authors: Luca Tedeschini, Matteo Fasulo,
Abstract summary: We propose a hierarchical approach to modeling the slur reclamation process.<n>Our core assumption is that members of the LGBTQ+ community are more likely to employ certain slurs in a eclamatory manner.<n> Experimental results on Italian and Spanish show that our approach performs statistically comparable to a strong BERT-based baseline.
Score: 0.42970700836450487
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Detecting reclaimed slurs represents a fundamental challenge for hate speech detection systems, as the same lexcal items can function either as abusive expressions or as in-group affirmations depending on social identity and context. In this work, we address Subtask B of the MultiPRIDE shared task at EVALITA 2026 by proposing a hierarchical approach to modeling the slur reclamation process. Our core assumption is that members of the LGBTQ+ community are more likely, on average, to employ certain slurs in a eclamatory manner. Based on this hypothesis, we decompose the task into two stages. First, using a weakly supervised LLM-based annotation, we assign fuzzy labels to users indicating the likelihood of belonging to the LGBTQ+ community, inferred from the tweet and the user bio. These soft labels are then used to train a BERT-like model to predict community membership, encouraging the model to learn latent representations associated with LGBTQ+ identity. In the second stage, we integrate this latent space with a newly initialized model for the downstream slur reclamation detection task. The intuition is that the first model encodes user-oriented sociolinguistic signals, which are then fused with representations learned by a model pretrained for hate speech detection. Experimental results on Italian and Spanish show that our approach achieves performance statistically comparable to a strong BERT-based baseline, while providing a modular and extensible framework for incorporating sociolinguistic context into hate speech modeling. We argue that more fine-grained hierarchical modeling of user identity and discourse context may further improve the detection of reclaimed language. We release our code at https://github.com/LucaTedeschini/multipride.

Related papers

SLAyiNG: Towards Queer Language Processing [44.4984082814346]
SLAyiNG is the first dataset containing annotated queer slang derived from subtitles, social media posts, and podcasts.<n>We describe our data curation process, including the collection of slang terms and definitions, scraping sources for examples that reflect usage of these terms.<n>As preliminary results, we calculate inter-annotator agreement for human annotators and OpenAI's model o3-mini.
arXiv Detail & Related papers (2025-09-22T07:41:45Z)
A Unified Multi-Task Learning Architecture for Hate Detection Leveraging User-Based Information [23.017068553977982]
Hate speech, offensive language, aggression, racism, sexism, and other abusive language are common phenomena in social media. There is a need for Artificial Intelligence(AI)based intervention which can filter hate content at scale. This paper introduces a unique model that improves hate speech identification for the English language by utilising intra-user and inter-user-based information.
arXiv Detail & Related papers (2024-11-11T10:37:11Z)
Protecting Copyrighted Material with Unique Identifiers in Large Language Model Training [55.321010757641524]
A primary concern regarding training large language models (LLMs) is whether they abuse copyrighted online text.<n>We propose an alternative textitinsert-and-detect methodology, advocating that web users and content platforms employ textbftextitunique identifiers for reliable and independent membership inference.
arXiv Detail & Related papers (2024-03-23T06:36:32Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
Masked Part-Of-Speech Model: Does Modeling Long Context Help Unsupervised POS-tagging? [94.68962249604749]
We propose a Masked Part-of-Speech Model (MPoSM) to facilitate flexible dependency modeling. MPoSM can model arbitrary tag dependency and perform POS induction through the objective of masked POS reconstruction. We achieve competitive results on both the English Penn WSJ dataset and the universal treebank containing 10 diverse languages.
arXiv Detail & Related papers (2022-06-30T01:43:05Z)
Rethinking the Two-Stage Framework for Grounded Situation Recognition [61.93345308377144]
Grounded Situation Recognition is an essential step towards "human-like" event understanding. Existing GSR methods resort to a two-stage framework: predicting the verb in the first stage and detecting the semantic roles in the second stage. We propose a novel SituFormer for GSR which consists of a Coarse-to-Fine Verb Model (CFVM) and a Transformer-based Noun Model (TNM)
arXiv Detail & Related papers (2021-12-10T08:10:56Z)
Contextual Multi-View Query Learning for Short Text Classification in User-Generated Data [6.052423212814052]
COCOBA employs the context of user postings to construct two views. It then uses the distribution of the representations in each view to detect the regions that are assigned to the opposite classes. Our model also employs a query-by-committee model to address the usually noisy language of user postings.
arXiv Detail & Related papers (2021-12-05T16:17:21Z)
Offensive Language and Hate Speech Detection with Deep Learning and Transfer Learning [1.77356577919977]
We propose an approach to automatically classify tweets into three classes: Hate, offensive and Neither. We create a class module which contains main functionality including text classification, sentiment checking and text data augmentation.
arXiv Detail & Related papers (2021-08-06T20:59:47Z)
An audiovisual and contextual approach for categorical and continuous emotion recognition in-the-wild [27.943550651941166]
We tackle the task of video-based audio-visual emotion recognition, within the premises of the 2nd Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW) Standard methodologies that rely solely on the extraction of facial features often fall short of accurate emotion prediction in cases where the aforementioned source of affective information is inaccessible due to head/body orientation, low resolution and poor illumination. We aspire to alleviate this problem by leveraging bodily as well as contextual features, as part of a broader emotion recognition framework.
arXiv Detail & Related papers (2021-07-07T20:13:17Z)
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition. How to effectively model linguistic rules in end-to-end deep networks remains a research challenge. We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z)
Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application [63.10266319378212]
We propose a method for measuring complex variables on a continuous, interval spectrum by combining supervised deep learning with the Constructing Measures approach to faceted Rasch item response theory (IRT) We demonstrate this new method on a dataset of 50,000 social media comments sourced from YouTube, Twitter, and Reddit and labeled by 11,000 U.S.-based Amazon Mechanical Turk workers.
arXiv Detail & Related papers (2020-09-22T02:15:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.